NAVAL  POSTGRADUATE  SCHOOL 
Monterey,  California 


THESIS 


A  COMPARISON  OF  THREE  NUMERICAL 

METHODS  FOR 

UPDATING  REGRESSIONS 

by 

Grigorios  J.  Raptis 

September,  1991 

Thesis  Advisor: 
Thesis  Co-advisor: 

Dan  C. 

William  B. 

Boger 
Gragg 

Approved  for  public  release;  distribution  is  unlimited. 


T260681 


SECURITY  CLASSIFICATION  OF  THIS  PAGE 


REPORT  DOCUMENTATION  PAGE 


1a  REPORT  SECURITY  CLASSIFICATION 
unclassified 


lb  RESTRICTIVE  MARKINGS 


2a  SECURITY  CLASSIFICATION  AUTHORITY 


2b  DECLASSIFICATION/DOWNGRADING  SCHEDULE 


3  DISTRIBUTION/AVAILABILITY  OF  REPORT 
Approved  for  public  release;  distribution  is  unlimited. 


4  PERFORMING  ORGANIZATION  REPORT  NUMBER(S) 


5  MONITORING  ORGANIZATION  REPORT  NUMBER(S) 


6a   NAME  OF  PERFORMING  ORGANIZATION 
Naval  Postgraduate  School 


6b  OFFICE  SYMBOL 
(It  applicable) 
OK 


7a  NAME  OF  MONITORING  ORGANIZATION 
Naval  Postgraduate  School 


6c  ADDRESS  {City.  State,  and  ZIP  Code) 
Monterey,  CA  93943-5000 


7b  ADDRESS  (Ofy,  State,  and  ZIP  Code) 
Monterey,  CA  93943-5000 


8a  NAME  OF  FUNDING/SPONSORING 
ORGANIZATION 


8b  OFFICE  SYMBOL 
(If  applicable) 


9  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 


8c  ADDRESS  (City,  State,  and  ZIP  Code) 


10  SOURCE  OF  FUNDING  NUMBERS 


Program  tlemem  No 


Projea  Ni. 


Work  Unit  Acce^iion 
Number 


1 1  TITLE  (Include  Security  Classification) 

A  COMPARISON  OF  THREE  NUMERICAL  METHODS  FOR  UPDATING  REGRESSIONS 


12  PERSONAL  AUTHOR(S) 


Grigorios  J.  Raptis 


13a  TYPE  OF  REPORT 
Master's  Thesis 


13b  TIME  COVERED 
From  To 


14  DATE  OF  REPORT  (year,  month,  day) 
1991,  September 


15  PAGE  COUNT 
95 


16.  SUPPLEMENTARY  NOTATION 

The  views  expressed  in  this  thesis  are  those  of  the  author  and  do  not  reflect  the  official  policy  or  position  of  the  Department  of  Defense  or  the  U.S. 

Government. 


17.COSATICODES 


FIELD 


GROUP 


SUBGROUP 


1 8  SUBJECT  TERMS  (continue  on  reverse  if  necessary  and  identify  by  block  number) 

Hyperbolic  Rotation,  Gram-Schmidt  Factorization,  Reorthogonalization,  Column 
Permutation,  Updating 


19  ABSTRACT  (continue  on  reverse  if  necessary  and  identify  by  block  number) 

Three  numerical  procedures  are  presented  for  updating  regressions.  All  three  methods  are  based  on  QK  factorization,  but  after  that  they  use 
different  philosophies  to  update  the  regression  coefficients.  Elden's  algorithm  updates  using  only  the  triangular  matrix  R.  This  procedure  does 
not  use  orthogonal  transformations,  but  it  uses  hyperbolic  rotations.  The  modified  Gram-Schmidt  QR  process  is  used  by  Gragg-Leveque- 
Trangenstein's  method  where  the  matrix  with  orthonormal  columns  is  stored  and  updated.  Chan's  algorithm  computes  a  column  permutation  TT 
and  a  QR  factorization  of  a  matrix  A  such  that  a  rank  deficiency  of  A  will  be  revealed.  Although  the  three  methods  are  based  on  different  ideas 
and  can  be  used  for  different  purposes  their  comparison  shows  that  Chan's  algorithm  is  the  only  accurate  one  in  the  rank  deficient  case,  and 
thatGragg-Leveque-Trangenstein's  method  is  the  cheapest  and  the  most  stable. 


20.  DISTRIBUTION/AVAILABILITY  OF  ABSTRACT 

Q  UNCLASSIFIED/UNLIMITED  J   SAME  AS  REPORT  J    OTIC  USERS 


21   ABSTRACT  SECURITY  CLASSIFICATION 
Unclassified 


22a  NAME  OF  RESPONSIBLE  INDIVIDUAL 
Dan  C.  Boger 


22b  TELEPHONE  (Include  Area  code) 
(408)646*2607 


22c  OFFICE  SYMBOL 
OR/Bu 


DD  FORM  1473,  84  MAR 


83  APR  edition  may  be  used  until  exhausted 
All  other  editions  are  obsolete 


SECURITY  CLASSIFICATION  OF  THIS  PAGE 

Unclassified 


Approved  for  public  release;  distribution  is  unlimited. 

A  Comparison  of  Three  Numerical 

Methods  for 

Updating  Regressions 

by 

Grigorios  J.  Raptis 

Lieutenant  Commander,  Hellenic  Navy 

B.S.,  Technical  University  of  Athens 

B.S.,  Hellenic  Naval  Academy 

Submitted  in  partial  fulfillment 
of  the  requirements  for  the  degree  of 

MASTER  OF  SCIENCE  IN  OPERATIONS  RESEARCH 

from  the 
NAVAL  POSTGRADUATE  SCHOOL 

Septemb/r  1/991 


ABSTRACT 

Three  numerical  procedures  are  presented  for  updating  regressions.  All  three 
methods  are  based  on  QR  factorization,  but  after  that  they  use  different  philosophies 
to  update  the  regression  coefficients.  Elden's  algorithm  updates  using  only  the 
triangular  matrix  R.  This  procedure  does  not  use  orthogonal  transformations,  but  it 
uses  hyperbolic  rotations.  The  modified  Gram-Schmidt  QR  process  is  used  by  Gragg- 
Leveque-Trangenstein's  method  where  the  matrix  with  orthonormal  columns  is  stored 
and  updated.  Chan's  algorithm  computes  a  column  permutation  n  and  a  QR 
factorization  of  a  matrix  A  such  that  a  rank  deficiency  of  A  will  be  revealed.  Although 
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I .    INTRODUCTION 

A .   BACKGROUND 

Regression  analysis  provides  a  variety  of  modeling 
technigues  that  allows  the  analyst  to  relate  a  dependent 
variable  to  one  or  more  independent  variables.  The 
mathematical  complexity  of  the  model  and  the  degree  to  which 
it  is  a  realistic  model  will  depend  on  how  much  is  known  about 
the  process  being  studied  and  on  the  purpose  of  the  modeling 
exercise.  In  preliminary  studies  of  a  process  or  in  cases 
where  prediction  is  the  primary  objective,  the  models  will 
freguently  fall  in  the  class  of  models  that  are  linear  in  the 
parameters.  That  is,  the  parameters  enter  the  model  as  simple 
coefficients  of  the  independent  variables  or  functions  of  the 
independent  variables.  Such  models  will  be  referred  to  loosely 
as  linear  models.  In  many  statistical  problems,  it  is  useful 
to  express  the  dependent  variable  as  a  linear  function  of  the 
independent  variables.  Furthermore,  regression  analysis  can 
summarize  data  and  guantify  the  nature  and  strength  of  the 
relationships  among  variables.  Regression  analysis  can  also  be 
used  to  predict  new  values  of  the  dependent  variables  based  on 
observed  relationships. 


B.   ANALYSIS  OF  REGRESSION 

As  the  reader  will  see  in  the  following  chapter,  the  main 
purpose  of  this  thesis  is  to  analyze  the  efficiency  and 
numerical  stability  of  some  procedures  for  updating 
regressions.  Most  regression  problems,  however,  require 
decisions  on  which  variables  to  include  in  the  model,  the  form 
the  variables  should  take,  for  example,  X,  X2,  1/X,  etc.,  and 
the  functional  form  of  the  model.  It  is  assumed  that  there  is 
a  set  of  t  candidate  variables,  which  presumably  includes  all 
relevant  variables,  from  which  a  subset  of  r  variables  is  to 
be  chosen  for  the  regression  equation.  The  candidate  variables 
may  include  different  forms  of  the  same  basic  variable,  such 
as  X  and  X2,  and  the  selection  process  may  include  constraints 
on  which  variables  are  to  be  included.  For  example,  X  may  be 
forced  into  the  model  if  X2  is  in  the  selected  subset.  This  is 
a  common  constraint  in  building  polynomial  models. 

There  are  three  distinct  problem  areas  related  to  this 
general  topic: 

■  The  theoretical  effects  of  variable  selection  on  the 
least  squares  regression  results. 

■  The  computational  methods  for  finding  the  "best"  subset 
of  variables  for  each  subset  size. 

■  The  choice  of  subset  size  (for  the  final  model) ,  or  the 
"stopping  rule". 


This  thesis  mainly  discusses  the  second  problem  area  of 
finding  the  more  efficient  and  numerically  stable 
computational  methods.  Also,  we  generally  discuss  the  criteria 
for  the  "best"  subset  size  choice,  in  other  words  the  criteria 
for  the  "stopping  rules". 

The  simplest  linear  model  involves  only  one  independent 
variable  and  states  that  the  true  mean  of  the  dependent 
variable  changes  at  a  constant  rate  as  the  value  of  the 
independent  variable  increases  or  decreases.  Thus,  the 
functional  relationship  between  the  true  mean  of  Yit  E  (YJ  ,  and 
Xi  is  the  eguation  of  a  straight  line, 

E(ri)=p0+Pi(^i>' 

where  f$0  is  the  intercept,  or  the  value  of  E(Yi)  when  X  =  0, 
and  fix  is  the  slope  of  the  line,  or  the  rate  of  change  in  E(Yi) 
per  unit  change  in  X.  The  constants  /J0  and  fix  are  population 
parameters  which  are  to  be  estimated  from  the  sample  of 
observations. 

The  observations  of  the  dependent  variables,  Yir  are 
assumed  to  be  random  observations  from  populations  of  random 
variables  with  the  mean  of  each  population  given  by  E(Yi)  .  The 
deviation  of  an  observation  YL  from  its  population  mean  E(Yi) 
is  taken  into  account  by  adding  a  random  error,  €it  to  give 
the  statistical  model 


r^Po  +  P^+e,.  (1.1) 

The  subscript  i  indicates  the  particular  observation  unit,  i 
=  l,2,...,n.  The  Xi  are  the  n  observations  of  the  independent 
variable  and  are  assumed  to  be  measured  without  error;  that 
is,  the  observed  values  of  X  are  assumed  to  be  a  set  of  known 
constants.  The  YL  and  Xi  are  paired  observations;  both  are 
measured  on  every  observational  unit.  The  random  errors  ed 
are  assumed  to  be  normally,  independently  and  identically 
distributed: [Ref.  1] 

1.   The  Matrix  Model 

Given  the  above  regression  model,  we  can  write  it  in 
matrix  form  as  Y  =  X6  +  e ,  where  Y  is  the  n  x  1  column  vector 
of  observations  of  the  dependent  variable.  The  n  x  p  matrix  X, 
where  x10  =  x20  =  ...  =  xn0  =  1,  will  be  called  the  regression 
matrix,  and  the  x^ ' s  are  generally  chosen  so  that  the  column 
of  X  are  linearly  independent;  that  is  X  has  rank  p.  However, 
in  some  experimental  design  situations  the  elements  of  X  are 
chosen  to  be  0  or  1,  and  the  columns  of  X  may  be  linearly 
dependent.  In  this  case  X  is  commonly  called  the  design 
matrix.  Also,  /?  is  the  p  x  1  vector  of  parameters  to  be 
estimated;  and  e  is  the  n  x  1  vector  of  random  errors. 
Writing  out  the  components  of  Y  =  X/J  +  e  yields 


V 


V*W 


■^io  -^n  ■*; 
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■  I 
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■  ••  *1P 
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Pi 

■ 
■ 

+ 

C2 
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(nxl) 


(nxp') 


(p'xl)   (nxl) 


The  elements  of  a  particular  row,  r,  of  X  are  the 
coefficients  of  the  corresponding  parameters  in  /J  which  give 
E(Yr)  .  The  vectors  Y  and  e  are  random  vectors;  the  elements  of 
these  vectors  are  random  variables.  The  matrix  X  is  considered 
to  be  a  matrix  of  known  constants.  The  vector  ^  is  a  vector  of 
unknown  constants  and  is  to  be  estimated  from  the  given  data. 

Using  the  above  assumptions  on  eir  the  random  vector 
e  has  a  multivariate  normal  distribution  with  a  mean  of  the 
zero  (0)  vector.  The  variance-covariance  matrix  for  any  random 
vector  of  n  elements  is  defined  as  an  n  x  n  symmetric  matrix 
with  the  diagonal  elements  equal  to  the  variance  of  ex  and  the 
off-diagonal  elements  equal  to  the  covariance  between  eL  and 
6j.  Let  Z  be  a  nxl  vector  of  random  variables  zlf  z2,  z3/  ... 
,zn;  the  variance-covariance  matrix  of  Z  is  the  following 
n  x  n  matrix. 


Var(Z)    = 


1    var(zx)       cov(zlf  z2) 
cov(z2,  zx)       var(z2) 

cov(zD,zx)    cov(zn,z2) 


cov(zirzB)) 
cov(z2,zB) 


var(zn) 


(1.3) 


The  variance-covariance  matrix  of  e  is  la2,  and  since 
e  is  independent  and  identically  distributed,  the  distribution 
of  e    is  written  in  shorthand  notation  as 

€-#(0,  Jo2)  , 

where  I  is  the  n  x  n  identity  matrix  and  a2  is  the  common 
variance  of  all  et.  Furthermore,  since  the  elements  of  X  and 
fi  are  constants,  the  X/3  term  in  the  model  is  a  set  of 
constants  being  added  to  the  vector  of  random  errors,  e.  Thus, 
Y  is  a  random  vector  with  mean  vector  X/J  and  variance- 
covariance  matrix  la2: 


E(Y)  =E(X$+t)  =E(X$)  +E(e)  =Xp 


(1.4) 


Var{Y)  =Var(X$+e)  =Var(e)  =Io 


-T«2 


(1.5) 


Var(Y)  is  the  same  as  Var(e)  since  adding  a  constant  to  a 
random  variable  does  not  change  the  variance.  When  e  is 
multivariate  normally  distributed,  Y  is  also  multivariate 
normally  distributed.  Thus, 


Y~N(xp,Io2)  (1-6) 

This  result  is  based  on  the  assumption  that  the  linear  model 
being  used  is  the  correct  model  [Ref.  2:. 68]. 

2 .   The  Normal  Equations 

Before  considering  the  problem  of  estimating  P ,  we 
note  that  we  are  referring  to  the  model  (2),  where  Xi0  is  not 
necessarily  constrained  to  be  unity.  In  the  case  when  Xi0  f  1 
we  have  to  use  a  notation  in  which  i  runs  from  0  to  p-1  rather 
than  1  to  p.  However,  since  the  major  application  of  the 
theory  is  to  the  case  Xi0  =1,  it  is  convenient  to  separate  /J0 
from  the  other  £j's  right  from  the  outset. 

The  oldest  method  of  obtaining  an  estimate  of  /3  is  the 

method  of  least  squares.  It  dates  back  at  least  to  Gauss  and 

consists  of  minimizing  I te Lz   with  respect  to  /3  (see  Fig.  1)  ; 

that  is,  setting  q  =  X/3 ,  we  minimize  eTe1  =  ||  Y  -  q||2  subject 

to  qe!R[X]  =  W,  where  W  is  the  range  space  of  X  {y:y=Xx  for 

some  x}.  If  we  let  q  vary  in  W,  ||  Y  -  q|| 2  (the  square  of  the 

length  of  Y  -  q)  will  be  a  minimum  for  q  =  q*  when  (Y  -  q*)  is 

orthogonal  with  W  (see  Fig.  1) . 

Thus  m  ...  „. 

XT(Y  -  q*)    =0  (1-7) 

or 


1  We  denote  the  transpose  of  a  vector  y  by  yT,  and 
similarly  for  a  matrix  transpose. 


Figure  1.  The  method  of  least  squares  consists  of  finding 
A  such  that  AB  is  a  minimum. 


xTq*  =  xTY.  (1.8) 

Here  q*  is  uniquely  determined,  being  the  unique  orthogonal 
projection  of  Y  onto  W.  Now  given  that  the  columns  of  X  are 
linearly  independent,  there  exists  a  unique  vector  (0)  such 
that  qA  =  x(3.  Therefore  substituting  in  (1.8)  we  have 


XTX   P  =  XTY 


(1.9) 


the  so-called  normal  equation(s).  At  this  point  we  assume  that 
X  has  rank  p,  so  XTX  is  positive  definite  and  therefore 


nonsingular.  If  so,  equation  (1.9)  has  a  unique  solution, 
namely, 

P  =  {XTX)'X  XTY  (1-10) 

Here  (5  is  called  the  ordinary  least  squares  estimate  of  /?. 
Computational  methods  for  calculating  (5  are  given  in  the 
following  chapters.  Notice  that  (5  can  also  be  obtained  by 
writing 

eTZ=(Y-X$)T{Y-XV)    =   YTY-2PTXTY+$TXTXP 

using  the  fact  that  0TXTY  =  (£TXTY)T  =  YTX0 ,  and  differentiating 
eTe  with  respect  to  /? .  Thus  from  deTe/dfl   =  0  we  have 

-2XTY+2XTXQ   =  0 

or 

XTX$    =  XTY. 

This  solution  for  /J  gives  us  a  stationary  value  of  eTe  ,  and 

a  simple  algebraic  identity  confirms  that  (5  is  a  minimum 

[Ref.  2]. 

The  multiplication  XTX  generates  a  p '  x  p1  matrix 
where  the  diagonal  elements  are  the  sums  of  squares  of  each  of 
the  independent  variables  and  the  off-diagonal  elements  are 
the  sums  of  products  between  independent  variables.  The 
general  form  is 


XTX 


11  ^1X11  X)  X12 

/„  xn  2L,  xn     z^i  xnxi2 

2^,xi2  2^iXiixi2     zL  X12 

2^xip  2^xnxip  2^lxi2Xip 


Z*i  x-i.ixiP 

Zs  Xl2Xlp 


E* 


ip 


(1.11) 


Summation  in  all  cases  is  over  i  =  1  to  n,  the  n  observations 
in  the  data.  When  only  one  independent  variable  is  involved, 
XTX  consists  of  only  the  upper-left  2x2  matrix.  Inspection 
of  the  normal  equations  will  reveal  that  the  elements  in  this 
2x2  matrix  are  the  coefficients  of  0O  and  0X  . 

The  elements  of  the  matrix  product  XTY  are  the  sums  of 
products  between  each  independent  variable  in  turn  and  the 
dependent  variable: 


Y,xnYi 
Y.xi*Yi 


XTY     = 


(1.12) 


The  first  element,  Y.Yif  is  the  sum  of  products  between  the 
vectors  of  ones  (the  first  column  of  X)  and  Y.  As  is  mentioned 
above,  if  only  one  independent  variable  is  involved,  XTY 
consists  of  only  the  first  two  elements. 
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The   fitted   regression   Y  =  X$ ,   is   denoted   by 

Y   (  =  [(£4)]) ,  and  the  elements  of 

e  =  Y-Y  =  Y-X$    =     (In-X(XTX)1  XT)Y    =     (IQ-P)Y,    say, 

are  called  the  residuals.  The  minimum  value  of  6Te,  namely, 

eTe  =   (Y-X$)T(Y-X$) 

=   Y*Y-2tTXTY+brXTxQ 

=  YTY-$TXTY+$T[XTX$-XTY] 

=   YTY-PTXTY  =   YTY-$TXTX$,  (1-13) 

is  called  the  residual  sum  of  squares  (RSS)  .  Notice  that  Y   and 
e  are  uniquely  defined  by  (5  . 


C.   VARIABLE  SELECTION  IN  LEAST  SQUARES  ANALYSIS 

The  purpose  of  the  least  squares  analysis  will  influence 
the  manner  in  which  the  model  is  constructed.  Hocking  [Ref .  4] 
relates  six  potential  uses  of  regression  equations  given  by 
Mallows  [Ref.  3] : 

■  Providing  a  good  description  of  the  behavior  of  the 
response  variable 

■  Prediction  of  future  responses  and  estimation  of  mean 
responses 

11 


■  Extrapolation,  or  prediction  of  responses  outside  the 
range  of  the  data 

■  Estimation  of  parameters 

■  Control  of  a  process  by  varying  levels  of  input 

■  Developing  realistic  models  of  the  process. 

Assume  that  the  correct  model  involves  t  independent 
variables  but  that  a  subset  of  p  variables,  chosen  randomly  or 
on  the  basis  of  external  information,  is  used  in  the 
regression  eguation.  Let  Xp  and  /?p  denote  submatrices  of  X  and 

P   that  relate  to  the  p  selected  variables.  |3p  will  denote  the 

least  sguares  estimates  of  /?p  obtained  from  the  p-variate 
subset  model  and  MS(Resp)  the  mean  sguares  residual  obtained 
from  the  p-variate  subset  model.  After  the  above  the  following 
properties  are  summarized: 

■  MS(Resp)  is  a  positively  biased  estimate  of  a2  unless  the 
true  regression  coefficients  for  all  deleted  variables  are 
zero. 

■  0p  is  a  biased  estimate  of  0p  and  less  variable  than  the 

corresponding  statistics  obtained  from  the  t-variate  model 
[Ref.  4]. 

Stepwise  regression  methods  are  variable  selection  methods 
which  identify  good  (although  not  necessarily  the  best)  subset 
models,  with  considerably  less  computing  than  required  for  all 
possible  regressions.  The  subset  models  are  identified 
sequentially  by  adding  or  deleting,  depending  on  the  method, 
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the  one  variable  that  has  the  greatest  impact  on  the  residual 
sum  of  squares.  These  stepwise  methods  are  not  guaranteed  to 
find  the  "best"  subset  for  each  subset  size,  and  the  results 
produced  by  different  methods  may  not  agree  with  each  other. 

D.   RESEARCH  METHODOLOGY 

Given  the  regression  model  Y  =  X/?  +  e ,  where  X  is  n  x  p, 
a  number  of  computational  techniques  have  being  suggested  for 
solving  the  following  steps: 

■  Solve  the  normal  equations  XTX  fl  =  XTY . 

■  Calculate  the  residual  e  =  Y-X$ . 

■  Calculate  the  residual  sum  of  squares  RSS  =  eTe. 

■  Update  the  regression  model  (that  is,  add  or  remove  a 
row  of  X) . 

■  Add  or  remove  a  regressor  (that  is,  add  or  remove  a 
column  of  X. ) 

■  Calculate  an  F-statistic  for  a  general  linear 
hypothesis. 

Solving  the  above  steps  is  a  common  problem  in  a  computer 
laboratory.  These  problems  arise  in  a  variety  of  areas  and  in 
a  variety  of  contexts.  Linear  least  squares  problems  are 
particularly  difficult  to  solve  because  they  frequently 
involve  large  quantities  of  data,  and  they  are  frequently  ill- 
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conditioned2.  The  research  methodology  will  be  to  describe 
acceptable  procedures  for  updating  regressions.  The  procedures 
to  be  chosen  should  be  economically  attractive  and  numerically 
accurate,  in  the  case  that  the  number  of  observations  in  the 
regression  is  large  compared  to  the  number  of  variables  in  the 
regression  model.  Each  of  these  procedures  (algorithms)  has 
advantages  and  disadvantages  which  will  be  explained  below. 


E.   THESIS  ORGANIZATION 
1.   General 

The  main  purpose  of  this  thesis  is  to  bring  to  the 
attention  of  readers  numerically  acceptable  procedures  for 
adding  and  deleting  rows  and  columns  from  a  regression  model. 
For  the  case  of  a  single  row  to  be  inserted  or  deleted,  the 
algorithms  use  simple  techniques:  plane  or  hyperbolic 
rotations  and  the  Gram-Schmidt  process.  The  algorithms  for 
inserting  rows  are  stable,  but  the  problem  of  deleting  a  row 
may  be  ill-conditioned  and  some  algorithms  for  this  process 
may  be  numerically  unstable. 

The  need  for  updating  regression  results  arises  for 
various  statistical  or  numerical  reasons.   When  data  are 


2A  set  of  linear  equations  Bx  =  c  is  said  to  be  ill- 
conditioned  if  small  errors  or  variations  in  the  elements  of 
B  and  c  have  a  large  effect  on  the  solution  x. 
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arriving  sequentially,  it  may  be  undesirable  or  impossible  to 
wait  for  all  the  data  before  obtaining  some  regression 
results.  In  various  time-series  problems,  one  is  interested  in 
the  changing  relationships  between  variables.  A  regression 
model  with  a  fixed  number  of  lagged  terms,  as  it  moves  over  a 
series  of  data,  generates  a  "window"  on  the  sample,  with  a  new 
observation  added,  and  an  old  one  deleted3,  as  the  window 
moves  to  the  next  point  in  the  series  [Ref .  5] . 

2.   Basic  Schedule 

The  procedures  for  updating  regressions  that  we  shall 
discuss  here  are  the  following: 

■  An  algorithm  based  on  the  normal  equations 
(Efroymson  M.  A.,  1960[Ref.  6],  Draper  N.  and  Smith 
H.  ,  1966[Ref.  7]);  this  had  been  the  standard 
introductory  approach  in  regression  courses.  In  this  algorithm 
the  regression  coefficients  are  solved  using  Gauss-Jordan 
elimination  on  the  normal  equations.  However,  the  problem  of 
solving  the  normal  equations  is  frequently  much  more  ill- 
conditioned  than  the  original  problem  of  solving  the 
overdetermined  linear  equations. 

■  Stepwise  regression  analysis  with  orthogonal 
transformations    (Lars    Elden    1972) [Ref .  8].    In    this 


3But  in  this  case  the  problem  can  be  ill-conditioned, 
because  the  rank  can  be  decreased  by  this  operation. 
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algorithm  a  method  is  presented  using  QR-decomposition.  In  the 
QR-decomposition  or  factorization,  we  express  a  matrix  as  a 
product  of  an  orthogonal  matrix  Q  and  an  upper  trapezoidal 
matrix  R  (actually  R  stands  for  right  trapezoidal) .  Many  of 
the  ideas  used  in  this  method  have  been  proposed  by  Golub 
(1965)  [Ref.  9]. 

■  The  modified  Gram-Schmidt  triangular  factorization 
(W.  Gragg,  R.  Levegue,  J.  Trangenstein  1978) 
[Ref.  10].  This  approach  is  A  =  QR  factorization  with 
Q  having  orthonormal  columns  and  R  upper  triangular.  In  this 
algorithm  one  is  able  to  add  or  drop  regressors  (columns  of  A) 
or  observations  (rows  of  A)  using  essentially  only  the  storage 
needed  for  A  and  to  secure  numerical  accuracy  possibly  at  the 
expense  of  additional  computation  and  storage. 

■  Rank  revealing  QR-factorization  (T.  Chan,  Hansen 
1986) .  In  this  algorithm  a  method  is  presented  for  computing 
a  column  permutation,  II  \  and  a  QR  factorization,  AD  =  QR,  of 
an  n  by  m  (n  >  m)  matrix  A  such  that  a  possible  rank 
deficiency5  of  A  will  be  revealed  in  the  triangular  factor  R 
having  a  small  lower  right  block.  Notice  that  a  matrix  A  is 
rank  deficient  with  rank  deficiency  d  if  it  has  at  least  d 


AII  is  a  permutation  matrix  where  IIe!Rnxn  and  is  the  product 
of  P1P2...Pn-i.  Notice  that  Pi  denotes  the  matrix  representation 
of  the  column  interchange  that  precedes  step  i. 

5This  means  that  the  columns  of  the  observation  matrix  A 
are  not  linearly  independent. 
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singular  values.  For  matrices  of  low  rank  deficiency,  the 
algorithm  can  reveal  the  rank  of  A,  and  the  cost  is  only 
slightly  more  than  the  cost  of  one  regular  QR  factorization. 
A  posteriori  upper  and  lower  bounds  on  the  singular  values6 
of  A  can  be  used  to  infer  the  numerical  rank  of  A. 

In  Chapter  II  we  discuss  the  general  theory  about 
updating  regressions.  A  process  for  adding  and  dropping 
regressors  follows.  Technigues  are  presented  with  stepwise 
regression  in  mind,  and  we  discuss  how  to  compute  the  various 
guantities  of  statistical  interest  using  the  algorithms. 

Chapter  III  mainly  covers  computational  details  of  the 
algorithms,  which  are  used  to  compute  the  regression 
coefficients.  In  this  chapter  also  a  simulation  is  applied  to 
the  basic  algorithms  for  validating  the  models  by  creating 
useful  numerical  results.  In  the  same  chapter  we  discuss  the 
measures  of  effectiveness  of  each  model  based  on  the 
previously  generated  numerical  results.  Chapter  IV  concludes 
the  thesis  and  offers  recommendations. 


6The  singular  values  of  a  matrix  A  are  the  "diagonal 
elements"  of  S,  one  of  the  three  component  matrices  that  A 
splitted  to  avoid  singularity. 
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II.    BACKGROUND  THEORY 

A.   GENERAL  THEORY  FOR  FITTING  A  SPECIFIED  REGRESSION 

Several  techniques  can  be  used  to  solve  the  problem  of 
updating  regressions.  Efroymson's  algorithm  [Ref.  6]  is  one  of 
the  earlier  ones  used.  It  uses  the  method  of  Gauss-Jordan 
elimination  on  the  normal  equations  ATAx  =  ATb.  A  more 
efficient  algorithm  is  to  use  the  Cholesky  factorization  of 
the  Gram  matrix  ATA  into  RTR  where  R  is  upper  triangular.  The 
solution  to  the  original  system  is  then  found  by  a  two  step 
triangular  solve  process: 

R  Ty   =  A  Tb,         Rx  =  y        —        A  TAx  =  RTy  =  ATb . 

Another  way  to  solve  the  above  problem  utilizes  orthogonal 
transformations.  This  approach,  called  QR  factorization,  is 
the  basis  of  the  thesis  algorithms  and  may  be  slightly  slower 
than  the  normal  equation  approach,  but  is  more  stable 
numerically.  The  QR  factorization  of  a  matrix  can  be  computed 
using  the  following  methods: 

■  Classical  Gram-Schmidt  (CGM) . 

■  Modified  Gram-Schmidt  (MGS) . 

■  Givens  rotations. 

■  Householder  reflection  (Elden's  algorithm). 
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■  QR  with  Column  Pivoting  for  the  Rank  Deficiency  case 
(Chan's  algorithm). 

■  Daniel-Gragg-Kaufman-Stewart  method  (Gragg-Levegue- 
Trangenstein ' s  algorithm). 

The  latter  algorithm  uses  rotations  and  the  Gram-Schmidt 
process  with  reorthogonalization. 

Applying  the  QR  technigue  on  a  sguare  nonsingular  system 
Ax  =  b  we  have  (QR)x  =  b  and  by  the  associative  law  of  matrix 
multiplication  Q(Rx)  =  b.  Defining  y  =  Rx,  it  follows  that  Qy 
=  b.  Hence,  Ax  =  b  is  solved  in  two  steps: 

1.  Solve  Qy  =  b  for  the  unknown  y. 

2.  Solve  Rx  =  y  for  the  unknown  x. 

The  second  system  is  solved  by  back  substitution.  The  first 
system  is  easy  to  solve  since  Q  is  orthogonal.  Multiplying  Qy 
=  b  by  QT  yields  QTQ  =  QTb.  Since  Q  is  orthogonal,  QTQ  =  I  and 
y  =  QTb.  When  A  has  more  rows  than  columns  and  is  of  full  rank 
the  above  process  yields  the  solution  of  the  least  sguares 
problem  ||  b  -  Ax||  22  =  minimum.  The  key  to  the  numerical 
stability  is  the  orthonormality  of  the  columns  of  Q.  To  avoid 
loss  of  orthogonality  Gragg-Levegue-Trangenstein1 s  algorithm 
applies  reorthogonalization  whenever  ||u||/||u||  is  small  (say, 
less  than  J 2/2)  ,  where  u  and  u  is  the  vector  to  be 
orthogonalized  and  its  orthogonal  projection,  respectively. 

Finally  if  A  is  rank  deficient  then  the  QR  factorization 
need  not  give  a  basis  for  range  (A).  This  problem  can  be 
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corrected,  as  mentioned  below,  by  computing  the  QR 
factorization  of  the  column  pivoted  version  of  A,  AD  =  QR 
where  II  is  a  permutation,  and  applying  Chan's  rank  revealing 
scheme . 

In  the  least  sguare  problem,  statisticians  and  numerical 
analysts  use  different  notations  for  the  same  entities,  i.e., 
the  matrix  of  the  observations  of  independent  variables,  the 
vector  of  the  dependent  variables,  the  vector  of  random 
errors,  etc.  To  avoid  misunderstandings  caused  by  using  both 
notations  in  the  following  chapters,  we  built  the  following 
TABLE  I  of  notational  correspondents. 

Most  numerical  analysts  use  n  to  represent  columns  of  a 
matrix  and  m  to  represent  rows.  Others  interchanges  m  and  n, 
and  we  will  use  this  notation. 
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TABLE  I 
NOTATIONS  USED  IN  REGRESSION  ANALYSIS 

DESCRIPTION 

STATISTITIANS 

N.  ANALYSTS 

Independent  var.  matrix 

X 

A 

Dependent  var.  col.  vector 

y 

b 

Vector  of  parameters  to  be 
estimated 

P 

X 

Rows  of  indep.  var.  matrix 

n 

n 

Cols  of  indep.  var.  matrix 

P 

m 

Vector  of  random  errors 

€ 

r 

Unique  solution  to  the 
normal  equation 

P 

X 

The  vector  of  estimated 
means  of  the  dep.  var. 

Y 

P7 

The  idempotent  n  x  n  matrix 

P 

QQT 

The  residuals  vector 

e 

r 

Residual  sum  of  squares 

SS(Res) 

rTr 

Siqnific.  level  enter/stay 

pa/b 

V 

7The  numerical  analysts  describe  the  vector  p 
as  the  "orthoprojector  onto  R (A) " . 


=  Ax  =  b-r, 
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B.   METHODS  FOR  UPDATING  REGRESSIONS 

Before  starting  this  section,  let  us  introduce  some 
terminology.  The  upper  case  letters  denote  matrices,  lower 
case  letters  denote  vectors,  and  Greek  letters  denote  scalars. 
Thus  we  write 


x=p  = 


V 

rp; 

fPxN 

V 

.      1      \ 

• 

= 

■ 

,  b=y= 

■ 

= 

■ 

,   A=X= 

a               ■ 
■               ■ 
•               • 

i         ■ 

= 

■                 ■ 

•                 ■ 

vTpJ 

.Pp> 

Jp, 

^p, 

[anl     ' 

'          nn, 

V^ai     * 

1  ■      ^nn/ 

(2.1) 


following  the  notational  correspondents  of  TABLE  I.  For 
notational  convenience  we  set  x  =  P  =  b  and  the  normal 
eguations  can  now  be  written  Xb  =  y  or  Ax  =  b. 


1.   Lars  Elden  (1960) 

In  Elden 's  algorithm  the  upper  triangular  matrix  R  in 
the  QR-decomposition  of  A  (in  the  model  Ax  =  b)  is  determined 
by  successive  Householder  transformations  so  that  after  k 
steps  A  has  the  form 


Vm  ■  ■  •  V 


(R  <«  i 
0 

(k) 


(*) 


aik) 


(« 


4> 

(1)   (n-k-l) 


\)k 
}  n-k 


R<k)   is  upper  triangular  and     T( k)     is  k  x    (m  -  k) .    Furthermore, 
Pk^  =  I-2vk+1y^Xl    where  Wk+1  with    Iv**^  =  1    is   now  chosen  so 
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that  k  first  rows  are  left  unchanged  and 


■Pjt+i 


(    0 


=  Xek,     where     \X\  =  \a£:l\2. 


After  m  steps  X  will  be  completely  triangularized 


PJ>n_x    ...  PXA  = 


In  stepwise  regression  analysis,  however,  the  final 
result  may  be  of  the  form 


QTA  = 


2J  (p)    rp  (p) 

0    A  W  J ' 


(2.2) 


where   R(p)   is   an   upper   triangular   p   x   p   matrix.   The 
corresponding  right  hand  side  and  the  residual  vector  are: 


QTy  = 


(c{**\  (    0  \ 

I,  and    z   =   ,  , 


The  regression  coefficients  are  calculated  from  the 
system  R(p)b  =  c(p).  In  most  cases  the  column  vectors  of  X  are 
not  added  to  the  subspace  in  the  natural  order,  so  that  only 
after  a  permutation  of  columns  is  QTA  of  the  form  shown  in 
equation  (2.2) . 

a.  Choice  of  Vector  for  Enlarging  the  Subspace. 

We  shall  add  to  the  subspace  the  vector  that  will 
make  the  norm  of  the  residual  vector  decrease  as  much  as 
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possible.  Since  only  the  m  -  k  last  rows  of  the  matrix  are 
changed  in  the  kth  step  we  can  without  loss  of  generality 
restrict  ourselves  to  considering  the  first  step  of  the 
analysis.  We  have  the  linear  system  of  equations  Ax  =  b,  with 
A  and  b  as  in  (2.1).  Let  ad  =  (a^,  a2j ,  .  ..,  anj)T  denote  the 
jth  column  vector  of  A.  We  shall  multiply  (A  i  b)  by  an 
orthogonal  matrix  P  =  I  -  2wwT,  where  ||w||28  =1,  so  that  for 
some  j  we  have  Pa0  =  ae1#  where  |A|  =  ||aj||2.  After  this 
transformation  the  residual  vector  is  given  by  the  last  n  -  1 
components  of  Pb  and  since  we  were  to  decrease  the  norm  of  the 
residual  vector  as  much  as  possible,  j  shall  be  chosen  so  that 
the  first  component  (Pb):  of  Pb  will  be  as  large  as  possible. 
Now  we  have  (Pb)i  =  (Pb)^  =  (Pb)1;^/*  =  bTPTPa.j/A  =  bTajyA. 
Thus  we  shall  select  j  so  that  the  quantity 


will  be  as  large  as  possible  [Ref.  9]. 

b.  Choice  of  Vector  for  Diminishing  the  Subspace. 

Suppose  that  after  a  number  of  steps  (A  :  b)   has  the 
form 


8The  Euclidean  length  of  a  vector  is  often  denoted  ||x||2, 
also  called  the  2-norm,  since  the  components  of  x  are  raised 


to  the  second  power, 


=  s[xi 


2  ,  ..2  ,      ,   2 


||x||2    =   dXx  +X2  +  .  .  .  +Xn  . 
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(2.3) 
{    0    A<«   d(«J 

where  R(k)  is  an  upper  triangular  (k  x  k)  matrix.  Let  rp  =  (plp, 
p2p,  ...  ,  ppp,  0,  ...  ,  0)T  denote  the  pth  column  vector  of  Rk. 
Now  if  the  j_th  column  vector  of  A  is  removed  we  can  by 
successive  rotations  in  the  (i ,  i  +  1)  ,  (i  +  1 ,  i+2)  . . . . , fk-1 ,  k) 
planes  transform  Rk  to  triangular  form  apart  from  the  j_th 
column.  That  is,  we  compute  the  coordinates  of  the  remaining 
column  vectors  of  A  in  the  orthonormal  basis  where  the  jth 
base  vector  has  been  excluded. 

Let  Qj,j*i   denote  the  orthogonal  rotation  matrix  in 

the  (j  ,  i  +  1)  plane  which  deviates  from  the  unit  matrix  only  in 
the  elements  g\j  j  =  qj+1  j+1  =  cosOj  and  qj>j+1  =  -<2j+i,j  =  sinGj.  If 
8j  is  chosen  so  that 

cose,- *j-j+x  ,       sln6,= Pj*i>J*i 

J     r~ 2 2 J   r~2 2 

we  get,    0j,i+i*>+1  =  QjJ*x(Pi.j*i»  ■  •  •Pj.j+i'Pj+i.j+i'0'  •  -/0)r  = 

=  lPi,j+i'  ■  ■  •'Pj-i,j+i'Pj,J+i'0'0'  ■  •  ■/°)T/  where 

/     r~2    2 
Pj.j+i  ~  \P],j+i+Pj+i,j+i  - 

Further  we   have 

Qj,j*i  rj  -  iPi,jr  •  •  • '  Pj-i.j'  Pi. J»  Pj+i,j'Qi  ■  ■  ■  /  o; 
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where  p '  d  ti   =  cosG^  pJd   and  P'j+i,j  =  -sinGj  pd  d.  After  k  -  i  -  1 
rotation  we  have  transformed  Rk  to  the  form 


(9 


*o   =  C^ 


11 


Pi.i- 


Pi.i     Pi.^+1  ■ 


Pi-i,i-i   Pj-i,j   Pi-i,^+i 


p'j., 


9  k.  j 


Pi.i+i   • 
0 


Pi,*  "l 


Pi-i,Jt 
9j.k 


Pjt-l.Jk 

0 


where  Qk   =  Qt-i,*  /  ■  ■  •  /  Oj,;/+i  • 


The  preceding  can  be  illustrated  by  the  following  schematic 
example  (k  =  5,  j  =  3) 


R   = 


X   X   X   X   X 

X   X   X   X 

XXX 

X   X 

X 


rotation 
on   (3,4) 


lx  x  x  x  x" 


X  X    X    X 
XXX 

+  □  X 


rotation 
on   (4,5) 


lX  X  X  X   x^ 
X   X   X    X 
XXX 
X  X 

*     n) 


where  x  =  any  nonzero  element,  +  =  nonzero  element  being 
introduced,  D  =  element  being  annihilated. 

we 

multiply  by  the  same  rotation  matrices  from  the  left  as  in  the 
following  schematic  example. 


To  update  the  transposed  inverse  R"T  =  XT 


26 


XT     = 


X 

X  X 

XXX 

X   X   X   X 

X   X   X   X   X) 


rotation 
on   (3,4) 


x 

X  X 

X   X  □    + 

X   X  X    X 

\X   X  X    X   X) 


rotation 
on   (4,5) 


x 

X  X 
XX  X 

X   X  □   X    + 

X   X  X    X   X 


Since  Rok)    can  be  transformed  to  triangular  form  by  permuting 

its  columns,  it  is  easy  to  see  that  (J?0(A:))   will  be  triangular 

if  the  same  permutation  is  performed  on  its  rows.  Therefore 
the  jth  column  of  Qk(Rck))~T  =  (R0(k))~T  has  only  one  nonzero 
element.  Let  x5  =  (0,  ...,  0,  £dj/  ...,  Cjk)T  denote  the  jth 
column  vector  of  (R(k))~T.  Then  QkXj  =  Aek.  Since  ||Xj||2  =  ||QkXj|| 
we  have  |  A  |  =  ||Xj||2.  The  right  hand  side  of  (2.3)  is  multiplied 
by  Qk  and  as  the  dimension  of  the  subspace  is  decreased  the 
increase  of  the  norm  of  the  residual  vector  will  come  from  the 
last  component  (Qkc(k))k  of  Qkc(k).  But 


(Qkc^)k  =  e£Qkc 


k*k' 


(« 


kekTQkc  <*>        x?QkTQkc  «        x/e  (« 


Thus    in  each   step  we  have   to   find   for  which  value   of  j_ 


VJ     - 


(x/cW)a 


is  minimal,  and  if  for  this  j.  the  increase  of  the  norm  of  the 
residual  vector  is  not  significant  the  jth  column  vector  of  X 
is  removed  from  the  subspace  (the  variable  Xj  is  deleted  from 
the  regression) . 
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c.  Inserting  a  Row  (Observation) . 

In  some  applications  it  may  be  desirable  to  insert 
a  row  to  the  matrix  of  observations  after  the  analysis  is 
finished  and  study  what  influence  that  row  can  have  on  the 
regression  coefficients.  This  can  be  done  without  having  to 
start  from  the  beginning  again  using  the  QR-factorization  that 
has  already  been  done.  The  procedure  can  be  illustrated  by  the 
following  somewhat  simplified  example.  Suppose  we  have  the 
system  of  equations  Rx  =  c,  where 


R  = 


Pn    P12 
P22 


Pifl 
P2fl 


an/ 


V 


,        C    = 


\1nJ 


The  system  is  augmented  by  a  row  (an  observation) : 


'  Pll      P12 

■             ■ 

•         Pi, 

'  YiN 

P22 

■             ■ 

■          P2n 

Y2 

• 

■ 

■ 

*o    = 

■ 

■                          • 

H  nn 

co    = 

■ 

Yn 

^aa+l,l     an+l,2 

•             ■ 

■     *«+i#a> 

kPii+i, 

If  we  multiply  [R0  i  c0]  by  a  rotation  matrix  Qln+1  (Qj  in  the 
(1,  n+1)  plane  where  8X  is  chosen  so  that 
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cos  91  =  pxl  /  yp?i+aa+lfl        ,        sin  Gx  =  afl+1#1  /  /p[^ 


we   obtain 


(i)    _ 


Vl,n+i   -"o 


(1) 


Pll 


Pl2 


'22 


a(1) 

an+l,2 


Pin 
P21, 


a(1) 


where 


(i)   _ 


pi/j  =  cos  et  pltj  +  sin  0!  aa+1#i  j=l,  .  .  .  ,n 


(i) 


an+i,j  -  -sin  6X  plfJ  +  cos  G1  afl+1  tj         j  =  2,  ...  ,n. 

After  n  successive  rotations  in  the  (1,  n+1),  (2,  n+1) , 
(n,  n+1)  planes 


Pll        Pl2 
P22 


(X.) 


*0  -    Qn.n.l     <©a>      '   •   '     0i.a*i     (©i>     *0    = 


Pin 
p2n 


Pun 


c0  has  been  transformed: 

Now  we  have  the  system   R0(n)  x  =  (cx(2) ,  c2(2),  .  ..,  cn(n))T 
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(the  component  bn+1Cn)  belongs  to  the  residual  vector)  . 

d.  Removing  a  Row  (Observation) . 

Sometimes  it  may  be  desirable  to  remove  a  row  of 
the  matrix  of  observations  after  the  solution  has  been 
obtained.  This  can  be  done  in  a  very  simple  manner  without  the 
analysis  having  to  be  done  from  the  beginning  again.  Suppose 
that  the  matrix  A  of  observation  has  been  decomposed: 


«•'*  ■  o  -  Qlb  -  3  (2-4) 


Let  ay  {v  =  l,...,m)  denote  the  row  vectors  of  X.  Then  the 
normal  equations  of  the  system  Ax  =  b  can  be  written  in  the 
following  form: 

m  m 

v=l  v=l 

If  ap  is  the  row  of  A  that  is  going  to  be  removed,  let 

/  R 


S  = 


\iap 


where    i2  =  -1 


Then   ST   S    =   RT   R   -   aDT   aD   =   RT   QAT  QA  R   -    aDT   aD   = 


=   AT   A   -    apT   ap   = 


m 

=    E  avTav 

v  -1 ,  v  *p 
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It  is  easy  to  see  that  if  the  vector  c  in  (2.4)  is  augmented 
by  a  component  ibp,  the  corresponding  right  hand  side  of  the 
normal  equation  will  be 


v-l,  v*p 


S   can   be   brought   to   triangular   form   by   successive 
multiplications  by  matrices  T1#n+1,  T2 >n+1,  ...,Tnn+1  where  Tv>n+1  is 
not  unitary  but  complex  orthogonal  (Tu  n+12  =  I)  .  Consider  the 
first  step: 
Let 


(9 


T  = 


/ 


«2        ~2 
Pll-ffpl 


11 


Vi<XPi 


la  ,\ 


•PiiJ 


(2.5) 


Then 


*i,«  3     = 


Pll       Pl2        • 
P22        ' 


0      ia 


P2 


pL,n 

P2a 


■      ■      ■*  &pnl 


where 
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pi 


,     _    i(aplpirPllxpj) 
*Pj a       ^ J  -  2 ,  .  .  . ,  n . 

yPii  ~api 
After   n   transformations   S   has   been  brought   to   the    form 


0  ■ 


and  the  right  hand  side  to  the  form 

(  c"\ 
\±bp) 

where  we  denote  with  double  prime  (")  the  final  components  of 
the  normal  equations.  Then  we  have  the  system  R" z  =  c"  to 

solve.  Notice  that  the  component  ib'p'   belongs  to  the  residual 

vector.  The  hyperbolic  rotations  that  are  used  to  update  the 
regression  coefficient  by  working  only  on  the  matrix  R  are  not 
unitary.  Non-unitary  transformations  can  destroy  numerical 
stability. 

2.   Gragg-Leveque-Trangenstein  (1978) 

These  algorithms  are  based  on  the  use  of  (orthogonal) 
plane  rotations  and  the  modified  Gram-Schmidt  process  with 
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reorthogonalization.  We  compute  the  following  quantities  of 
interest  in  a  regression  analysis,  with  stepwise  regression: 

■  The  least  squares  coefficients  b, 

■  The  residuals  r  =  y  -  Xb, 

■  The  ANOVA  table9, 

■  The   covariance   matrix   for   the   regression 
coefficients,  and 

■  Partial  F-tests  for  the  significance  of  a  given 
variable  in  the  regression. 

a.  Stepwise    Regression:    Adding    and    Deleting 
Regressors 

Let  us  have  entered  k  -  1  columns  of  X  in  the 
model  y  =  Xb,  and  we  wish  to  add  the  next  column.  This  means 
that  we  want  to  find  a  k  x  k  matrix  Q*  whose  columns  form  an 
orthonormal  basis  for  the  span  of  the  k  columns  of  X,  assuming 
that  Qk-1  is  known.  The  remaining  columns  of  X  are  regressed 
upon  the  first  k  columns.  Let  ^  be  a  k  x  k  triangular  matrix 
which  maps  the  orthonormal  basis  into  the  original  basis.  Also 
rk  is  the  orthogonal  projection  of  the  vector  y  of  dependent 
variable  onto  the  space  orthogonal  to  the  space  of  the  first 
k  columns.  In  other  words,  rk  is  the  residual  vector. 
Mathematically,  with  k  -  1  columns  entered,  we  have  factored: 


9Here,  the  residual  sum  of  squares  is  computed  directly, 
and  not  by  subtracting  the  sum  of  squares  due  to  regression 
from  the  total  sum  of  squares. 
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(     \ 

t 

X  y 

= 

\       ) 

v 

@k-l    At-1    rk-l 


Rk-1    **-l    Ck_^ 


0 

or 


X 


(2.6) 


where      Qii.1      is   n   x    (k-1)    and 

Qk-^Qk-i  -i;  (2.7) 

also  the  columns  of  Ak_!  and  rk_!  are  orthogonal  to  the  columns 
of  Qi-!.  This  means  that 


Qk-iTAk-i=0>      QK-iTrk-i=°; 


(2.8) 


Also   Rj.-!   is  (k-1)  x  (k-1)  upper  triangular;  and 


ck-i=Qk-iTy- 


(2.9) 


The  above  procedure  was  a  step  of  the  Gram-Schmidt 
process  that  can  be  viewed  as  a  partial  QR  factorization. 
Although  there  are  some  similarities  between  the  Gram-Schmidt 
factorization  and  Householder's  factorization,  there  are  also 
some  important  differences.  We  are  going  to  discuss  that  in 
the  next  chapter  during  the  comparison  process. 

To  update  the  above  factorization  (i.e.,  entering 
the  kth  column  or,  in  other  words,  to  add  the  kth  variable) , 
we  perform  the  following  steps: 

■  Repartition: 
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/      \ 

t 

X  y 

= 

\        J 

v 

£*-l  ak   Ak    rk-l 


Rk-i  8k  Tk  Ck-i 

0T  1  0T  0 

0  0  J  0 

0T  0  0T  1 


(2.10) 


Normalization,  and  the  equation  (2.10)  becomes: 


<?*-i 


M 


ak     Ak 


•jt-i 


(**+   8k  * 


k    ^k-1 

oT  M  or    o 

0         0       J       0 
0T       0      0r       1 


obtain: 


Orthogonalization,      with  (^     =     3Li/\\ak\\)  ,      to 


Gt-i  3*  AJC-gJt(gJtrAJt)    rk_x-qk(qkTrk_J 


Rk-\  8k         Tk  ck-i 

°r  INI  Qk*Ak  QkTrk.x 

0  0  J  0 

V  or  o       or         i 


Relabel,  so  that  the  equation  (2.10)  can  be 


written  as: 


Qk  Ak  x* 


v 


\(*k    Tk   Ck\ 
0   J  0 

^0r  0r  1 


) 


The  above  mathematical  statements  complete  the 
insertion  of  the  kth  regressor.  Furthermore  the  work  required 
is  essentially  2n(p  -  k)  multiplications  and  additions.  The 
storage  space  required  and  the  part  of  storage  affected  by  the 
algorithm  is  illustrated  by  these  partitioned  matrices. 
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Suppose  now  that  we  have  to  enter  the  kth  variable 
(column)  into  the  model.  In  this  case  the  appropriate  column 
to  be  added  is  the  one  which  minimizes  the  norm  of  the 
residuals.  We  have  that  g^r*  =  0  and  from  the 
orthogonalization  and  relabel  steps,  that 

By  the  Pythagorean  theorem, 

IPW  "  W2  =  («/*«>■  (2.11) 

so  when  we  add  the  kth  variable  to  the  model  the  change  in  the 
residual  is  (g^r^x)2.  Recalling  that  q^j  =  akT/||  ak||  ,  the  change 
can  be  expressed  as  (q^T^^)2  =  (akTrk_x/||  ak||  )2.  So  the  chosen 
column  to  add  at  the  kth  stage  is  a  column  ad,  k<j<p,  for 
which  |  a/r^l  /||  a  j  ||  is  maximal.  A  test  for  significance  should 
be  satisfied  for  the  above  column  to  be  entered  into  the 
regression.  The  work  for  choosing  the  column  to  add  is  on  the 
order  of  2n(p  -  k)  multiplications  and  additions. 

To  compute  the  regression  coefficients  we  have  to 
follow  the  next  steps.  A  n  x  k  matrix  Xk  is  formed  by  the 
first  k  columns  of  X.  It  has  been  factored  Xk  =  0^,  where  Qk 
is  n  x  k  with  orthonormal  columns,  and  R^  is  upper  triangular. 
The  projection  of  y  onto  the  range  of  X  can  be  written  as: 

y~rk  =  Qtck  =  xbt  =  <Wv 

So  the  regression  coefficient  Y\  can  be  computed  easily  by 
solving  the  following  triangular  system 
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reguiring  an  order  of  0 . 5k2  multiplications  and  additions. 

The  next  step  is  the  computation  of  the  total  sum 
of  squares  ||y||2.  A  smart  mathematical  computation  is  taking 
place  here.  After  computing  the  sum  of  squares  due  to  the 
regression  at  the  kth  stage  of  the  regression,  as: 

since  ck  is  a  k-vector,  the  residual  sum  of  squares  is 
computed  directly  as  ||rk||2,  rather  than  by  subtraction  which 
would  mean  a  loss  of  accuracy  due  to  cancellation.  This  is 
recommended,  especially  if  the  mean  residual  sum  of  squares  is 
to  be  used  in  sensitive  tests  for  significance  levels.  The 
common  criterion  for  terminating  the  selection  process  is  the 
ratio  of  the  reduction  in  residual  sum  of  squares  caused  by 
the  next  candidate  variable  to  be  considered  to  the  residual 
mean  square  from  the  model  including  that  variable.  This 
criterion  can  be  expressed  in  terms  of  a  critical  "F-to-enter" 
or  in  terms  of  a  critical  "significance  level  to  enter"  (SLE) , 
where  F  is  the  "F-test"  of  the  partial  sum  of  squares  of  the 
variable  being  considered.  We  can  compute  the  "F-to-enter", 
Fe,  for  the  kth  variable  by  using  the  residual  sum  of  squares 
before  and  after  the  insertion  as  in  (2.6).  Let  RSS^!  = 
(r^)  T(rk_1)  and  RSSk  =  (rk)T(rk)  be  respectively  the  residual 
sum  of  squares  after  each  variable  addition.  Also  let  (n  -  m) 
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be  the  degrees  of  freedom  for  the  last  model,  where  n  and  m 
are  the  numbers  of  rows  and  the  columns  of  the  current  matrix 
X  of  the  independent  variables.  After  the  above,  the  "F-to- 
enter"  is 


Fe    = 


JiSSj-_.         KSSk 

RSSJ  (n-m) 


To  compute  the  covariance  matrix  of  the  regression 
coefficients  i\   using  the  normal  equations,  (XkTXk)_1  usually  is 
computed.  For  avoiding  this  numerically  unstable  process,  a 
new  procedure  is  followed  in  this  algorithm.  Using  the  above 
orthogonal  decomposition,  it  is  found  that: 

(XkTXk)  -1  =  iRj-*)  T(RkT)  , 


where  R^j  =  (Rk  X)T  =  (Km)"1*  The  approach  of  the  algorithm  is 
to  solve  RkTVk  =  I,  and  compute  (XkTXk)_1  as  VkTVk.  Hence,  Vk  is 
updated  as  follows: 


'I    0 

,0T   1 


(R>-,T    o  ^ 


( 


r 
VxJt-i 


INI 


"Jt-1 

1   _  Tx' 


\\H 


r*TVk-i 


\\*4 


=  R   TV 
Kk    vk- 


Thus,  Vk  differs  from  Vk.x  only  in  the  bordering  of  a  new  row 
and  column.  The  above  updating  of  Vk  is  mathematically 
equivalent  to  forwardsolving  R^Vfc  =  I  directly.  Notice  that 
the  computation  of  variances  (i.e.,  the  diagonal  of  the 
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covariance  matrix)  of  the  b^  requires  only  on  the  order  of  k- 
multiplications  and  additions.  The  variances  of  Y\  are  needed 
for  significance  tests  (i.e.,  t-test  and  F-test) . 

Having  entered  k  variables   (columns)   into  the 
regression,  we  have  the  following  factorization: 


/      > 

/ 

X  y 

= 

\        ) 

v 

0Jt  Ak    *k 


'**  **  C^ 


0   J  0 

[0T   0r  lj 

(k)  (p-k)   (1) 

where  Qk  is  n  x  k  with  orthonormal  columns,  P^  is  k  x  k  upper 
triangular,  and 


OkTrk  =  o,       QkTAk  =  °>       ck  =  QkTy- 


(2.12) 


Now,  to  delete  the  j.th  regressor,  where  1  <  j  <  k,  the  next 
steps  are  followed.  The  columns  of  X  are  permuted,  so  that  for 
some  permutation  matrix  P, 


/      \ 

/ 

X  y 

P  = 

V        J 

v 

Qjc  Ak  rk 


Rk.  (1,1)  Rk,  (1,2)  8j,i    Tk,i    ck,i 


V 


0 
0 

or 


Rk,(2,2)      8j,2     Tk,2     Ck,2 


0 

or 


I 

0 


(2.13) 


(j-1)         (k-j)         (1) (p-k)   (1) 

For  k  =  6  and  i  =  3,  the  right-hand  factor  looks  like  the 
following: 


39 


lJt,  (1,1)  "^Jt,  (1,2)  aj,l 
0      "J**,  (2,2)  8j,2 


\ 


XX  XXX  X 

X  XXX  X 

XXX         X 
XXX 
X   X 

x 


where  x's  denote  possible  nonzeros,  and  zeros  are  left  blank. 
After  that,  by  using  a  Givens  transformation  or  (rotation) , 
the  subdiagonal  of  this  matrix  is  annihilated.  A  Givens 
rotation  is  any  orthogonal  matrix  of  the  form 


'Jk 


=  Gk-l,k'  •  '   *  'Gj,j+1 


(j-1)     (k-j-fl)     (n-k)     (1) 
where   Gjk   is   of   the    form 


/y   0    . 
0    1 


0     a  \ 
0 


0  10 

a   0    .     .     .    0    -y 


The  j.  an<^  K  subscript  in  Gjk  correspond  to  the  row  numbers 
associated  with  the  7's:  The  first  7  is  in  row  k  and  the 
second  7  is  in  row  j_.  Next  the  equation  (2.13)  is  partitioned 
and  after  using  the  orthogonality  of  Givens  rotations  and 
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performing   multiplications,   the   following   eguation   is 
obtained: 


Gjt.i  Qk.2GT  Ak  ri 


\ 


St,  (i.i)   ^Jt,  (1,2) 


8j,l      rjt,l 


'k,  l 


0 
0 
0r 


Gi?Jt,  (2,2)      G3j.  2     GTJt,  2     GCJt,  2 


0 

or 


J 


0 

1 


(2.14) 


Furthermore  (2.14)  is  repartitioned  and  relabeled  to  get  its 
last  shape: 


£jt-l     Ak-1  rk-l 


•^Jt-1     ^k-1  Ck-1 


0      J       0 

A  or   or    l  ) 
(k-i)  (p-Jc+i)  (l)  (k-i)  (p-Jc+l)  (D 


Note  that  (2.14)  holds  with  k  replaced  by  k  -  1  and  that  the 
deleted  column  is  ready  to  reenter  the  regression  at  any  time. 
The  above  algorithm  uses  approximately  n (p+2k-3i ) 
multiplications  and  additions. 

However,  the  chosen  column  to  drop  is  that  one 
which  yields  the  smallest  increase  in  the  residuals.  This 
increase  for  the  j_th  independent  variable  is  |  ft  j  |  2I  ||Uj||2,  where 
£j  is  the  j_th  regression  coefficient,  and  ||i>j||2  is  the  j_th 
diagonal  element  of  (XkTXk)"1.  Thus  the  column  to  be  dropped,  if 
a  test  for  lack  of  significance  is  satisfied,  is  the  column  j. 
for  which  |/?j|/  ||Uj||  is  smallest. 
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b.  Adding  and  Deleting  Observations 

Now  n  rows  of  X  have  entered  into  the  regression, 
and  an  additional  row  xT  and  observation  n  of  y  is  going  to  be 
inserted.  In  this  situation,  a  factorization  has  taken  place 
and  we  have: 


/ 

v 


(*m    yn)        (0a    0  rn 


[xT   n)       [0T   1  0) 


n       n 

xT   n 


where  Qn  is  n  x  p  with  orthonormal  columns,  1^  is  p  x  p  upper 
triangular,  also 

QnTQn  =  I  and  p  x  p,  QnTrn  =  0,  and  <\  =  QnTyn.  (2.15) 
Furthermore  the  Givens  rotations  are  applied  and  after 
relabeling,  repartitioning  and  performing  some  elementary- 
operations  we  get  the  following: 


fR         c      N 


[Qn+1    f+YffJ 


Notice  that  (2.15)  holds  with  n  replaced  by  n  +  1  and  that  the 
above  algorithm  reguires  n  +  1  additional  storage  locations 
for  g.  However,  we  can  easily  update  V  =  R"T,  since 


I=R„TV={RnT  x) 


and  we  apply  the  Givens  rotations  to  get: 
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J  =    (RnT  x)GT> 


« 


s  <£fl+1T     o) 


vu+i 


•"■13*1     vn*l 


Now  we  want  to  delete  the  j_th  observation.  Then  if 
xT  is  the  3_th  row  of  X,  n  is  the  j_th  entry  of  y,  qT  is  the  jth 
row  of  Q,  and  p  is  the  j_th  entry  of  r,  a  permutation  matrix  P 
can  be  obtained  such  that: 

(  \ 

P    Xa  Yn 

\  > 


(X       y     ) 

f 

f        \\ 

a       a 

= 

p 

Qn    rn 

<xT      r\   j 

\ 

K                  /> 

,or   1, 

w*. 


V<3TT  p 


\ 


/ 


\ 

a        n 

Q    o  f 

0r    0 

QT  1    P, 

,or   1, 

'V0r  1 
Next,  we  apply  the  Gram-Schmidt  process.  After  this  we  have 


G)=a  =  (l-<grrg)  2,  aq=-Qq,     y=qTr,     r^^r-yq; 

when  o  =  0,  (gr#d))  is  chosen  by  a  special  feature  of  the 
orthogonalization  code.  Now  if  we  use  this  orthogonalization, 
we  get: 


( 


\* 


n 


( 


qT   (o  p 


\(  I    q  0\{RD  cD\ 
0r 


o  y 

0r  0  1) 


0T    0 

t    x 


^o 


/ 
Q    5  ra_a 
qT  u>      p 


n        n 

0T    y 


0r  1 


.  (2.16) 


Next  we  choose  Givens  transformations  such  that,  (qT,  w)GT  = 
(qT,  w)Gpp+1.  .  .Gi^  =  (0T,  r)  .  Since  ||  (qT,  u)  II  =  1/  it  follows 
that  t  =  ±1.  So  the  equation  (2.16)  can  be  written  as: 
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N 

I 

*n-i  yB-i 

= 

xT      n  , 

V 

\(g*  0^ 


lor  1 


0T    y 


V0r   1/ 


[0T    1 


0r     x       p 


(0T       1    ) 


because  the  matrix 


'  Q   g 


GT  = 


(Qn-x   ** 
0*      xj 


(2.17) 


has  orthonormal   columns   it   follows  q*  =  0  and  because  we  have 


(v  \ 


[o1- 


=  ^i,p+i  ■  •  •  ^p.p*i 


■Kfl-l 


where  R^-i  is  an  upper  triangular  matrix.  However  from  the 
equation  (2.17)  above,  we  obtain  the  desired  factorization 
which  is: 


■^a-i  yn-x 


Qn-l    rn-l 


R  C       \ 

-Kfl-l  Cd-1 

[oT      1  j 


and  we  can  recover  the  dropped  row  by: 


xT   =  xs  T,         r\   =  ty  +  P 


Finally,  we  have  to  update  V  =  R  .  Now, 
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(I    0 
,0T  1 


-or 


T      n 


a)  a  j 


\ 

V  o^ 

GTG 

/ 

vqrT  <•>; 

\ 

— rr  T_~n  1 


*«-!*  * 


*a-i^-i+^r 


TU 


*r 


*o-i  U+S 


XV 


Since  t    =   ±1,  it  follows  that  u*  =  0.  Hence   Rn-iTvn-i 
Vn_x  is  the  desired  update  [Ref .  10]  . 


=  I,  and 


3.   Tony  F.  Chan  (1986) 

An  algorithm  is  presented  for  computing  a  column 
permutation  II  and  a  QR  factorization  All  =  QR  of  an  n  by  m  (n 
>  m)  matrix  A  such  that  a  possible  rank  deficiency  of  A  will 
be  revealed  in  the  triangular  factor  R  having  a  small  lower 
right  block.  For  matrices  of  low  rank  deficiency,  the 
algorithm  reveals  the  rank  of  A,  and  the  cost  is  only  slightly 
more  than  the  cost  of  one  regular  QR  factorization.  An  upper 
and  lower  bound  on  the  singular  values  of  A  are  stated.  These 
can  be  used  to  infer  the  numerical  rank  of  A. 

A  very  useful  factorization  of  an  n  by  m  (n  >  m) 
matrix  is  the  QR  factorization,  given  by  All  =  QR,  where  IIeRnxn 
is  a  permutation  matrix,  QeR1"™  has  orthogonal  columns  and 
satisfies  QTQ  =  In,  and  ReRnxT1  is  upper  triangular. 

If  A  has  full  rank,  then  R  is  nonsingular.  In  many 
applications  in  which  A  is  nearly  rank  deficient,  it  is 
desirable   to   select   the   permutation   so   that   the   rank 
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deficiency  is  exhibited  in  R  having  a  small  lower  right  block. 
For  if  R  is  partioned  as 


R  = 


Rlt   R12 

0   *22, 


where  R22  is  d  by  d,  then  it  is  easy  to  show  that 
ofl_r+1  (A)  <.  1-^2  la  '  where  we  have  used  the  notation  ot  to  denote 
the  ith  singular  value  of  A,  with  o1  >  o2  >  ...  >  on. 
Therefore,  if  l-R^la  is  small,  then  A  has  at  least  d  small 

singular  values  and  thus  is  close  to  being  rank  d  deficient. 
The  converse,  unfortunately,  is  not  true.  In  other  words,  if 
A  has  d  small  singular  values,  then  it  is  not  guaranteed  that 

a  given  QR  factorization  of  A  has  a  small  |.R22|2.  LetAfle  3lnjm 
be  the  matrix  of  order  n  illustrated  below: 


(1     -c 
0   1 


AD  =  diag{l,8,82,  .  .  .  ,sDl) 


-c 
-c 


{0     . 


-c\ 

-c 

-c 


00  1  ) 


where  s  and  c  satisfy  s2  +  c2  =  1.  For  n  =  50,  c  =  0 . 2 ,  we  have 
on(An)    ~   1CT4  .  On  the  other  hand,  An  is  its  own  QR  factorization 

and  obviously  has  no  small  R22  block  for  any  value  of  d. 

Besides  being  able  to  reveal  rank  deficiency  of  A,  a 
QR  factorization  with  a  small  R22  block  is  very  useful  in  many 
applications,  such  as  in  the  rank  deficient  least  squares 
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problem  and  in  the  subset  selection  problem.  Therefore  a 
variety  of  techniques  have  been  proposed  to  compute  it.  Since 
QR  factorization  is  essentially  unique  once  the  permutation  II 
is  fixed,  these  techniques  all  amount  to  findinq  an 
appropriate  column  permutation  of  A.  Perhaps  the  best  known  of 
these  is  the  column  pivotinq  strateqy.  Althouqh  this  strateqy 
is  usually  very  effective  in  producinq  a  trianqular  factor  R 
with  small  ||  R22ll  *■  very  little  is  known  in  theory  about  its 
behavior,  and  it  can  fail  on  some  matrices.  Chan's  alqorithm 
does  not  require  computinq  the  SVD  of  A  and  is  most  closely 
related  to  one  recently  developed  by  Foster 
[Ref.  11]. 

a.  Revealing  Rank  One  Deficiency 

Assume  that  A  is  nearly  rank  one  deficient.  We 
would  like  to  find  a  column  permutation  of  A  such  that  the 
resultinq  QR  factorization  has  a  small  pivotal  element  rnn.  It 
turns  out  that  this  permutation  can  be  found  by  inspectinq  the 
size  of  the  elements  of  the  sinqular  vector  of  A  correspondinq 
to  the  smallest  sinqular  value  an.  This  procedure  was  first 
pointed  out  in  1976  [Ref.  12]. 

Assume  that  there  is  a  vector  x  e  Rn  with  ||x||2  =  1 
such  that  ||  Ax  ||  2  =  f,  and  let  II  be  a  permutation  such  that  if 
IITx  =  y,  then  \Tln\    =  |y||«„  and  ||y||2  =  ||x||2  =  1.  Such  a  x  can  be 
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obtained  by  the  LINPACK10  condition  estimator.  Now,  because 
of  these  we  have  |  r?n  |  >  y/TJn    and  furthermore 


/     \ 


Q  TAx  =  Q  TA  II  IIrJC  =  Ry  = 


{PnnV 


n) 


Therefore, 


c   =  \Ax\2   =  \QTAx\2  =  \Ry\2  >  Ip^tiJ, 


from  which  we  have  the  result,    Ip^l  ^  -Jn  e  .  Now  let  v  e  R"  with|vn|  =  1 

be     the     right     singular     vector     of     A     corresponding     to     the 
smallest   singular  value   on.    Then  we   have 

\Av[2   =  aa. 

Therefore,  by  the  above,  if  we  define  the  permutation  II  by 

I  (IFV)  |fl  =  \v\„, 

then  All  has  a  QR  factorization  with  a  pivot  pnn    at  least  as 
small  as  \fnaB    in  absolute  value. 

Since  only  the  permutation  II  is  needed,  it  is  not 
necessary  to  compute  the  SVD  of  A  in  order  to  find  v  exactly. 
In  practice,  one  can  use  a  few  steps  of  inverse  iteration  to 
compute  an  approximation  to  v  from  which  the  permutation  II  can 
be  determined.  In  the  more  interesting  case  where  on<:an_1,    the 


10Together,  LINPACK  and  EISPACK  represent  the  state  of  the 
art  in  software  for  matrix  computation. 
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inverse  iteration  should  converge  rapidly.  This  suggests  a  two 
pass  algorithm  in  which  one  first  computes  any  QR 
factorization  of  A,  then  performs  inverse  iteration  with  R  to 
find  an  approximate  v,  then  determines  II,  and  then  computes 
the  QR  factorization  of  All. 


b.  Revealing  Higher  Dimensional  Rank  Deficiency. 

In  this  section,  we  consider  the  case  where  A  is 
nearly  d  deficient,  with  d  >  1.  Our  goal  is  to  find  a 
permutation  II  such  that  if 


AH  =  QR  =  Q 


\ 


#11  R12 


is  the  QR  factorization  of  All,  with  R22  6  Rdxd,  then  l-R^I  is 

small  in  some  norm. 

A  natural  way  to  extend  the  one  dimensional  result 
of  subsection  a  is  to  repeatedly  apply  the  one  dimensional 
algorithm,  for  d  =  1,2,...,RU,  the  leading  principal 
triangular  part  of  R.  Suppose  that  we  have  already  isolated  a 
small  d  x  d  block  R22.  To  isolate  a  small  (d+1)  x  (d+1)  block, 
we  compute,  using  the  one  dimensional  algorithm  given  in 
subsection  a,  a  permutation  P  such  that  R^^  =  Q1R11    is  the  QR 

factorization  of  RnP  and  where  the  (n-d,  n-d)th  element  ofRi:L 
is  small.  Then  with 
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It  -n 


p  o 

o  I 


where,   P  is  a  permutation  matrix  such  that  P  6  Rixi, 
|(Prv)i|  =  IP'vL  and  the  singular  vector  v    e    R1  corresponding 
to  amin(Rn)  with  ||u||  2  =  1,  and  SL   =   amin(Ru)  ,  also 


0  =  0 


(01   o 

0  I. 


After  the  above,  it  can  be  easily  verified  that 


A&  =  Q 


■^11    &  "^12 


^22  , 


is  the  QR  factorization  of  AC. 

To  make  the  above  procedure  more  understandable, 
the  updating  process  is  illustrated  for  n  =  5  and  i  =  2.  The 
permutation  is  defined  by  moving  the  second  column  of  R  to  the 
last  column.  Thus 


RJli   = 


<X    X     X     x  x\ 

X      X      X  X 

\\      X      X  + 

n,  x  + 

°3  * 


where  again,  x  =  any  nonzero  element,  +  =  nonzero  element 
being  introduced  and  n  =  element  being  annihilated.  So  working 
from  the  left  in  planes  (i,i+l),  (i+l,i+2),  ...  ,  (n-l,n)  we 
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annihilate  the  subdiagonal  of  Rll1  by  rotations.  Simultaneously 
we  compute  Q  =  QQX.    The  computation  It  =  Ei^  is  trivial,  like 

that  of  Rl^.  If  CJmin(Rn)  is  "tiny"  then  we  have  for  n  =  5, 


A  =  QR  ft"   =  Q 


X   X   X   X   X 

X   X   X   X 

XXX 

X   X 


n* 


with  e    also  tiny.  If  €    is  negligible  we  may  drop  the  last  row 
of  R   and  the  last  column  of  Q   to  get 


A   =  Q 


(x  x  x  x   D4\ 

x  x  x   Dj 

x   DJ 


ft" 


with  Q   having  orthonormal  columns.  Now  if  we  annihilate  the 

last  column  of  the  upper  trapezoidal  matrix,  as  before,  from 
the  bottom  to  top,  and  then  drop  the  last  columns  of  the  II  and 

R  matrices,  we  obtain  A  =  QR  ft" . 

The  above  algorithm,  to  produce  the  desired  QR 
factorization,  is  based  on  the  following  two  assertions  for  i 
=  n,  n-1,  ...,  n-d+1: 

■  Rn  has  a  small  singular  value  so  that  the  (i,i)th 

element  of  RX1    is  guaranteed  to  be  small. 
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■  The  last  [i.e.,  the  (n-i)th]  row  of  Qi^     is 

small . 

If  these  two  assertions  are  true,  then  the  lower 
(n-i+1)  x  (n-i+1)  block  of  R  is  small  and  we  have  the  desired 
QR  factorization.  But  these  two  assertions  are  true  because  of 
the  following  lemmas  proved  in  [Ref.  13]. 

Let  B  g  Rnxk  be  a  matrix  containing  any  subset  of  k 
columns  of  A.  Then 


'min 


(B)    =  oAB)    <.  aAA)   . 


Also  if  the  matrix  W  =  [wn.r+1,  .  .  .  ,wn]  e  Rnxr  has  been 
computed  by  the  above  algorithm  then  it  should  satisfy  the 
following  properties: 

■  II  Wjl  2  =  1, 

■  (wjj  =  0  for  j  >  i, 

■  I  (Wi)i|   =  ||wxL>=i/yi, 

■  HAlIwJU   =    5±    <    ffi(A)  . 

Finally,  Chan's  algorithm  [Ref.     13] 

computes  a  permutation  II  and  a  QR  factorization  of  A  given  by 
All  =  QR  where  the  elements  of  the  lower  d  x  d  upper  triangular 
block   of   R   satisfies 

jt=i 
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for  n-d  <  i  <.  j  <,  n. 

Very  often,  it  is  desirable  to  be  able  to  infer  the 
rank  of  A  from  its  QR  factorization  by  estimating  the  small 
singular  values  of  A  from  the  triangular  factor  R 
[Ref.  14].  For  this,  we  state  bounds  on  the  singular 
values  of  A  in  terms  of  the  matrices  R  and  W,  given  by  the 
following  inequality,  for  1  <  j  <  d, 

where,  W  e  Rnxn,  R  e  Rnxn  and  computed  by  the  rank  revealing 
algorithm,  R22J  and  W2j  denote  the  lower  right  i  by  j  upper 
triangular  blocks  of  R  and  W  respectively.  These  bounds  are 
proved  in  [Ref.  13]. 

Now  using  the  obtained  All  =  QR  factorization  we  can  solve 
the  least  square  problem  Ax  =  b  as  follows.  Let  r  be  the  rank 
of  the  n  x  m  matrix  A,  then  the  regression  coefficients  are 
given  by: 

x  =  n..i<(Ki.1)"aft1.ib) 

where  i  =  1,  2,  3,  ...  ,  r. [Ref .  15] 

The  work  for  the  AH  =  QR  factorization  is  given  by: 
W(r)  =  m2(n  -  m/3)  +  Im2r  +  2m2r,  where  I  is  the  number  of 
inverse  iterations  used  at  each  step.  Usually  I  =  2  is 
sufficient  in  practice.  So 
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W(r)  =  m2(n  -  m/3)  +  4m2r.  (2.18) 

C.   PROCESS 

In  the  problem  of  finding  a  vector  x  e  Rn  such  that  Ax  = 
b  where  the  data  matrix  A  e  R"1™  and  the  observation  vector 
b  e  Rm  are  given  and  m  >  n  when  there  are  more  eguations  than 
unknowns,  we  say  that  the  system  Ax  =  b  is  overdetermined. 
Usually  an  overdetermined  system  has  no  exact  solution  since 
b  must  be  an  element  of  range  (A) ,  a  proper  subspace  of  Rm. 

This  suggests  that  we  strive  to  minimize  ||  Ax  -  b||p  for 
suitable  choice  of  p.  Different  norms  render  different  optimum 
solutions.  However,  much  progress  has  been  made  in  this  area, 
and  there  are  several  good  techniques  available  for  1-norm  and 
oo-norm  minimization.  The  oldest  method  for  solving  the  full 
rank  least  squares  problem  is  the  method  of  normal  equations. 
The  accuracy  of  the  computed  normal  equations  solution  depends 
on  the  square  of  the  condition  number,  and  the  algorithm  is 
not  always  accurate. 

For  the  above  reason  some  techniques  are  established 
based  on  the  QR  factorization  method.  The  Householder  and 
Gram-Schmidt  QR  approaches  to  the  least  squares  problem  are 
more  stable  than  the  normal  equation  method.  Furthermore 
techniques  for  rank  revealing  QR-factorization  (i.e.,  Chan's 
algorithm  discussed  in  subsection  3)  can  give  a  solution  to 
the  subset  selection  problem  even  if  A  is  nearly  rank 
deficient. 
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In  the  following  chapters  we  will  try  to  compare  the 
results  of  the  above  algorithms  computationally  to  verify  the 
effectiveness  of  each  on  the  least  sguares  problem. 
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III.    RESULTS  AND  COMPARISONS 

A.   PREVIOUS  COMPARISONS 

There  have  been  a  number  of  studies  comparing  Householder 
and  Modified  Gram-Schmidt  QR  techniques  to  form  an  orthonormal 
basis  of  X  (A)  [Ref.  16].  Moreover,  in  Elden's  paper 
there  is  a  comparison  between  his  and  Efroymson's  algorithm 
for  updating  regressions.  There  are  comparisons  in  work  and 
accuracy  between  the  Classical  and  Modified  Gram-Schmidt 
methods  in  the  Gragg-Leveque-Trangenstein  paper.  Tony  Chan 
also  compares  his  column  permutation  QR  factorization 
algorithm  with  the  regular  QR  algorithm,  in  case  of  required 
work  [Ref.  17].  Chan's  comparison  is  applied  only  on 
the  QR  factorization  part  of  the  regression  updating 
procedure. 

Furthermore,  an  algorithm  for  solution  of  the  subset 
selection  problem  is  presented  in  a  technical  support  package 
of  NASA  Tech  Briefs  [Ref.  18]  that  can  be  mentioned 
as  a  modified  Chan's  algorithm.  So  a  comparison  can  be  done 
between  these  algorithms.  Finally  a  justification  of  the  use 
of  the  reorthogonalization  in  Gram-Schmidt  QR  factorization  is 
presented  in  the  Daniel-Gragg-Kaufman-Stewart  paper 
[Ref.  19]. 


56 


B.   MODEL  AND  DATA 

The  data  in  TABLE  II-l  will  be  used  to  illustrate  the 
model  selection  methods  in  the  full  rank  case.  In  the  table 
below  there  are  two  matrices,  the  13x5  matrix  X  consisting 
of  a  column  of  ones,  followed  by  the  4.  column  vectors  of  the 
observations  on  the  independent  variables,  and  the  13x1 
column  vector  Y  of  observations  of  the  dependent  variable.  In 
order  to  conform  with  previous  results,  the  data  utilized  in 
this  experiment  is  identical  to  that  used  in  Elden's  algorithm 
[Ref.  8].  We  began  the  stepwise  regression  analysis  by 
inserting  and  deleting  columns  of  the  observation  matrix  X  and 
we  continued  by  inserting  and  deleting  rows  of  the  model. 

The  data  in  TABLE  II-2  and  TABLE  II-3,  have  one  and  two 
dependent  columns  respectively.  We  obtained  the  first  of  them 
by  subtracting  columns  2  and  3  of  the  data  in  TABLE  II-l  and 
the  second  by  repeating  the  first  column  of  the  same  data. 
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c 

TABLE  II-l 

^SE  STUDY  1  ( INPUT  DATA) 
MATRIX  OF  OBSFRVA^TnMQ 

MATRIX  OF 

'  OBSERV. 

ON  THE  INI 

COLUMN 
VECTOR  OF 
OBSER.  ON 
DEPENDENT 
VARIABLES 

DEPENDENT  VARIABLES 

1.00 

7.00 

26.00 

6.00 

60.00 

78.50 

1.00 

1.00 

29.00 

15.00 

52.00 

74.30 

1.00 

11.00 

56.00 

8.00 

20.00 

104.30 

1.00 

11.00 

31.00 

8.00 

47.00 

87.60 

1.00 

7.00 

52.00 

6.00 

33.00 

95.90 

1.00 

11.00 

55.00 

9.00 

22.00 

109.20 

1.00 

3.00 

71.00 

17.00 

6.00 

102.70 

1.00 

1.00 

31.00 

22.00 

44.00 

72.50 

1.00 

2.00 

54.00 

18.00 

22.00 

93.10 

1.00 

21.00 

47.00 

4.00 

26.00 

115.90 

1.00 

1.00 

40.00 

23.00 

34.00 

83.80 

1.00 

11.00 

66.00 

9.00 

12.00 

113.30 

1.00 

10.00 

68.00 

8.00 

12.00 

109.40 

To  obtain  the  data  in  the  Table  II-2,  (a  non-full  rank 
matrix)  ,  we  replaced  the  column  4  of  the  Table  II-l  with  a  new 
column,  column  6.  This  new  column  was  obtained  by  subtracting 
columns  2  and  3  of  the  matrix  of  observations  in  Table  II-l. 
Now  the  new  matrix  of  observations  has  a  linearly  dependent 
column  and  so  it  is  rank  one  deficient. 
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TABLE  II-2 

CASE  STUDY  2a  (INPUT  DATA) 
MATRIX  OF  OBSERVATIONS 

MATRIX  OF  OBSERV.  ON  THE  INDEPENDENT  VARIABLES 

COLUMN 
VECTOR  OF 
OBSER.  ON 
DEPENDENT 
VARIABLES 

1.00 

60.00 

7.00 

26.00 

-19.00 

78.50 

1.00 

52.00 

1.00 

29.00 

-28.00 

74.30 

1.00 

20.00 

11.00 

56.00 

-45.00 

104.30 

1.00 

47.00 

11.00 

31.00 

-20. 00 

87.  60 

1.00 

33.00 

7.00 

52.00 

-45.00 

95.90 

1.00 

22.00 

11.00 

55.00 

-44. 00 

109.20 

1.00 

6.00 

3.00 

71.00 

-68.00 

102.70 

1.00 

44.00 

1.00 

31.00 

-30.00 

72.50 

1.00 

22.00 

2.00 

54.00 

-52.00 

93.10 

1.00 

26.00 

21.00 

47.00 

-26.00 

115.90 

1.00 

34.00 

1.00 

40.00 

-39.00 

83.80 

1.00 

12.00 

11.00 

66.00 

-55.00 

113.30 

1.00 

12.00 

10.00 

68.00 

-58.00 

109.40 

Furthermore  in  Table  II-3  we  inserted  once  more  the  column 
1  of  the  matrix  in  Table  II-2  as  a  new  column  7.  This  column 
took  the  third  place  in  the  matrix  and  reduced  its  rank  to 
rank  two  deficient. 
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TABLE  II-3 

CASE  STUDY  2b  ( INPUT  DATA) 
MATRIX  OF  OBSERVATIONS 

MATRIX  OF  OBSERV.  ON  THE  INDEPENDENT  VARIABLES 

COLUMN 
VECTOR  OF 
OBSER.  ON 
DEPENDENT 
VARIABLES 

1.00 

60.00 

1.00 

7.00 

26.00 

-19.00 

78.50 

1.00 

52  .00 

1.00 

1.00 

29.00 

-28.00 

74.30 

1.00 

20.00 

1.00 

11.00 

56.00 

-45.00 

104.30 

1.00 

47.00 

1.  00 

11.00 

31.00 

-20.00 

87.60 

1.00 

33.  00 

1.00 

7.00 

52.00 

-45.00 

95.90 

1.00 

22.00 

1.00 

11.00 

55.00 

-44.00 

109.20 

1.  00 

6.00 

1.00 

3.00 

71.00 

-68.00 

102.70 

1.00 

44  .  00 

1.00 

1.00 

31.00 

-30.00 

72.50 

1.00 

22.00 

1.00 

2.00 

54.00 

-52.00 

93.10 

1.00 

26.  00 

1.  00 

21.00 

47.00 

-26.00 

115.90 

1.00 

34.00 

1.00 

1.00 

40.00 

-39.00 

83.80 

1.00 

12  .00 

1.  00 

11.00 

66.00 

-55.00 

113.30 

1.00 

12  .  00 

1.  00 

10.00 

68.00 

-58.00 

109.40 

C.   SIMULATION 

As  mentioned,  the  three  compared  algorithms  use  a  QR 
factorization  technique  to  solve  the  regression  problem.  But 
the  three  techniques  used  have  different  philosophies  and 
different  advantages  and  disadvantages.  However,  to  obtain  the 
numerical  results  needed  for  the  comparison  we  used  three 
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analogous  sets  of  MATLAB  codes.  It  is  important  that  Elden's 
algorithm  is  not  a  "full"  updating  regression  algorithm 
because  there  is  no  way  to  update  regressions  using 
Householder  QR  factorization.  Elden  adds  or  deletes  columns  or 
rows  at  the  beginning  or  the  end  of  the  observations  matrix 
and  does  not  explicitly  update  the  Householder  QR 
factorization.  Because  of  this  we  did  not  implement  Elden's 
algorithm.  Instead,  we  used  the  basic  QR  factorization 
algorithm  to  solve  each  individual  problem.  For  column 
insertions  and  deletions  on  the  right  of  the  matrix  our 
algorithms  are  computationally  equivalent  with  Elden's 
algorithms. 


D.   VALIDATION 

A  numerical  example  was  needed  to  illustrate  the  theory 
developed  in  Chapter  II,  so  for  each  of  the  three  algorithms 
we  ran  the  corresponding  MATLAB  codes  on  the  data  of  the  case 
study.  All  computations  are  done  on  a  286  PC  in  double 
precision,  with  a  relative  machine  precision  of  about  10"16. 
Notice  that  during  the  stepwise  procedure  on  case  study  1, 
there  was  no  rank  deficient  case,  so  there  was  no  illustration 
of  the  effectiveness  of  Chan's  algorithm  to  reveal  the 
numerical  rank  of  a  given  matrix. 


61 


E.   MEASURES  OF  EFFECTIVENESS 

1.   Case  Study  1  (Full  Rank  Case) 

In  order  to  compare  the  performances  of  the  algorithms, 
the  STSC'S  Statistics/Graphics  package  STATGRAPHICS  4.0  was 
used  on  the  same  data.  The  results  of  the  algorithms  and 
STATGRAPHICS  stepwise  variable  selection  procedure  are 
presented  in  the  following  tables. 

STEPWISE     REGRESSION     ANALYSIS     WITH     ORTHOGONAL 
TRANSFORMATIONS 
NUMBER  OF  OBSERVATIONS   13 


STEP  Na  1 

TABLE  III-l.  Summary  Statistics  for  Stepwise  Selection  of 
Variables  for  the  Data  in  TABLE  II  after  step  1. 


VARIABLE  ENTERED  AFTER  THE  COLUMN  OF  ONES:  Column  5 


(1) 

ELDEN 


(2) 
GRAGG- 
LEVEQUE- 
TRANGENST 


(3) 

CHAN 


(4) 
S  •  G  •  S  . 
STATGRARH. 
STSC 


CONST.    117.5679312   117.5679312 


117.5679312 


117.567931 


VARIABLE 
COEFFICIENTS 


ALPHA 5 


-0.7381618 


-0.7381618 


-0.7381618 


-0.738162 


RES. 

SUM 

SQUAR 


883.8669169 


883.8669169 


883.8669169 


881.89616 


Fe 

LEVEL 


22.7985202 


22.7985202 


22.7985202 


22.7985 


F  OF 

GENER. 
MODEL 


22.7995203 


22.7995203 


22.7995203 


22.80 
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STEP  N8  2 

TABLE  III-2.  Summary  Statistics  for  Stepwise  Selection  of 

Variables  for  the  Data  in  TABLE  II  after  step  2. 


VARIABLE  ENTERED:  Column  2 

(1) 

(2) 

(3) 

(4) 

CONST. 

103.0973816 

103.0973816 

103.0973816 

103.097382 

VARIABLE 
COEFFICIENTS 

ALPHA5 

-0.6139536 

-0. 6139536 

-0.6139536 

-0.613954 

ALPHA2 

1.4399583 

1.4399583 

1.4399583 

1.439958 

RSS 

74.7621122 

74.7621122 

74.7621122 

74.7621 

Fe 

108.2239093 

108.2239093 

108.2239093 

108.2239 

F  GEN. 
MODEL 

176.6269531 

176.6269531 

176.6269531 

176.6270 

STEP  Ng  3 

TABLE  III-3.  Summary  Statistics  for  Stepwise  Selection  of 

Variables  for  the  Data  in  TABLE  II  after  step  3. 


VARIABLE  ENTERED  :  Column  3 

(1) 

(2) 

(3) 

(4) 

CONST. 

71.6483069 

71.6483069 

71.6483069 

71.648307 

VARIABLE 
COEFFICIENTS 

ALPHA 5 

-0.2365402 

-0.2365402 

-0.2365402 

-0.236540 

ALPHA2 

1.4519379 

1.4519379 

1.4519379 

1.451938 

ALPHA 3 

0.4161098 

0.4161098 

0.4161098 

0.41611 

RSS 

47.9727294 

47.9727294 

47.9727294 

47.9727 

Fe 

5.0258647 

5.0258647 

5.028647 

5.0259 

F  GEN. 
MODEL 

166.8316801 

166.8316801 

166.8316801 

166.8317 
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STEP  Na  4 

TABLE  III-4.  Summary  Statistics  for  Stepwise  Selection  of 

Variables  for  the  Data  in  TABLE  II  after  step  4. 


VARIABLE  REMOVED:  Column  5 

(1) 

(2) 

(3) 

(4) 

CONST. 

52.5773489 

52.5773489 

52.5773489 

52.577349 

VARIABLE 
COEFFICIENTS 

ALPHA2 

1.4683057 

1.4683057 

1.4683057 

1.468306 

ALPHA 3 

0.6622505 

0.6622505 

0.6622505 

0.662250 

RSS 

57.9044832 

57.9044832 

57.9044832 

57.9045 

Fe 

1.8632873 

1.8632873 

1.8632873 

1.8633 

F  GEN. 
MODEL 

229.5036971 

229.5036971 

229.5036971 

229.5040 

STEP  Na  5 

TABLE  III-5.  Summary  Statistics  for  Stepwise  Selection  of 

Variables  for  the  Data  in  TABLE  II  after  step  5. 


OBSERVATION  ENTERED:  Row  3,  (a    second  time) 

(1) 

(2) 

(3) 

(4) 

CONST. 

52.6817201 

52. 6817201 

52.6817201 

52.681720 

VARIABL 

E 

COEFFIC 

IENTS 

ALPHA2 

1.4584656 

1.4584656 

1.4584656 

1.458466 

ALPHA3 

0.6594452 

0.6594452 

0.6594452 

0.659445 

RSS 

59.9550974 

59.9550974 

59.9550974 

59.9551 

Fe 

F  GEN. 
MODEL 

250.3437770 

250.3437770 

250.3437770 

250.3438 
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STEP  N3  6 

TABLE  III-6.  Summary  Statistics  for  Stepwise  Selection  of 

Variables  for  the  Data  in  TABLE  II  after  step  6. 


OBSERVATION  ENTERED:  Row  2.  fa  second  time) 

(1) 

(2) 

(3) 

(4) 

CONST. 

53.0380112 

53.0380112 

53.0380112 

53.038011 

VARIABLE 
COEFFICIENTS 

ALPHA2 

1.4484905 

1.4484905 

1.4484905 

1.448490 

ALPHA3 

0.6549147 

0. 6549147 

0.6549147 

0.654915 

RSS 

60.8055442 

60. 8055442 

60.8055442 

60.8055 

Fe 



F  GEN. 
MODEL 

312.7948771 

312.7948771 

312.7948771 

312.7950 

STEP  N3  7 

TABLE  III-7.  Summary  Statistics  for  Stepwise  Selection  of 

Variables  for  the  Data  in  TABLE  II  after  step  7. 


OBSERVATION  REMOVED:  Row  1 

(1) 

(2) 

(3) 

(4) 

CONST. 

53.0380116 

53.0380116 

53.0380116 

53. 033012 

VARIABLE 

COEFFICIENTS 

ALPHA2 

1.4484905 

1.4484905 

1.4434905 

1.448491 

ALPHA3 

0.6549147 

0.6549147 

0.6549147 

0.654915 

RSS 

60.8055442 

60.8055442 

60.8055442 

60.8055 

Fe 

F  GEN. 
MODEL 

312.7948771 

312.7948771 

312.7948771 

312.7949 
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2.   Case  Study  2  (Rank  Deficient  Case) 

Here  we  want  to  test  the  thesis  algorithms  on  a  rank 
deficient  case.  So  we  used  the  singular  matrices  in  Tables  II- 
2  and  II-3.  The  first  matrix  has  one  linearly  dependent  column 
(column  5)  and  the  second  two  linearly  dependent  columns 
(column  3  and  6).  We  applied  the  corresponding  algorithms' 
MATLAB  codes  to  those  matrices.  The  results  of  the  algorithms' 
effectiveness  are  shown  in  the  following  tables. 


TABLE  IV-1.  Summary  statistics  for  the  rank  deficient  case, 
(case  study  2a) . 


RESULTS  OF  CASE  STUDY  2a  REGRESSION 

(1) 

ELDEN 

(2) 

GRAGG- 
LEVEQUE- 
TRANGENST. 

(3) 

CHAN 

(4) 
S  •  G  •  S  • 
STATGRARH 
STSC 

CONST. 

3.2155*1018 

-146.000000 

71.6483069 

VARIABL 

E 

coeffici: 

ALPHA5 

-0.0353*1018 

2.000000 

-0.2365402 

ALPHA2 

-0.0194*1018 

2.5676*101A 

1.4519379 

ALPHA3 

-0.0173*1018 

-2.5676*10u 

0.4161098 

ALPHA 6 

-0.0132*1018 

-2.5676*10u 

0.0000000 

RES. 

SUM 

SQUAR. 

5.7494*1036 

45.8653934 

47.9727294 

F  OF 

GENER. 
MODEL 

9.2803*10"3A 

116.4231889 

166.8316801 
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TABLE  IV-2 .  Summary  statistics  for  the  rank  deficient  case, 
(case  study  2b) . 


RESULTS  OF  CASE  STUDY  2b  REGRESSION 

(1) 

(2) 

(3) 

(4) 

CONST. 

5.786*10u 

-7.  6870*10A5 

71.6483097 

VARIABLE 
COEFFICIENTS 

ALPHA5 

-0.007242 

3.8663*1012 

-0.2365402 

ALPHA7 

-5.786*101A 

7.6870*10A5 

0.0000000 

ALPHA2 

0.708*10u 

-3.8663*1012 

1.4519379 

ALPHA3 

-0.708*10u 

3.8663*1012 

0.4161098 

ALPHA6 

-0.708*101A 

3.8663*1012 

0.0000000 

RSS 

59. 17750 

1.4738*1028 

47.9727294 

F  GEN. 
MODEL 

212.9952033 

3.953*10"31 

166.8316801 

A  discussion  of  the  above  results  is  given  in  the 
following  chapter. 
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IV.   CONCLUSIONS  AND  RECOMMENDATIONS 

A .   GENERAL 

The  major  emphasis  of  this  thesis  was  to  examine  the 
performance  of  three  algorithms  for  updating  regressions.  As 
mentioned,  the  algorithms  examined  were  Elden  (Householder) , 
Gragg-Levegue-Trangenstein  and  Chan.  After  running  the 
corresponding  MATLAB  codes  we  obtain  the  results  shown  in 
Tables  III  and  IV.  The  discussion  of  these  results  will  be 
divided  into  two  categories:  the  accuracy  and  stability,  and 
the  number  of  computations. 

1.   Accuracy  and  Stability 

a.  Full  Rank  Case 

The  results  of  the  case  study  1,  in  Table  III,  show 
us  that  no  one  algorithm  uniformly  outperforms  any  other  with 
regard  to  accuracy  in  the  full  rank  case.  All  algorithms  give 
exactly  the  same  results  in  all  steps. 

b.  Deficient  Rank  Case 


Comparing  the  results  of  the  case  study  2  (TABLES 
IV-1  and  IV-2)  with  the  results  of  the  case  study  1  step  3 
(TABLE  III-3)  we  can  see  that  they  are  identical  except  for 
the  coefficients  of  variables  ALPHA6  and  ALPHA7 .   These 
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variables  are  used  only  in  the  case  study  2  and  they 
correspond  to  the  linearly  dependent  columns  of  the  matrix  of 
observations  (see  TABLES  II-l  and  II-2).  The  values  of  these 
variables'  coefficients  in  both  rank  deficient  cases  are  zero. 
Because  of  the  above  we  can  say  that  the  results  of  Chan's 
algorithm  in  the  rank  deficient  case  without  the  linearly 
dependent  columns  are  the  same  as  the  other  algorithms' 
results  in  the  full  rank  case.  However  in  case  study  2  (rank 
one  and  rank  two  deficient  cases)  only  Chan's  algorithm  gives 
reasonable  results.  The  results  of  the  other  two  algorithms 
are  totally  wrong.  Now  it  is  easy  to  understand  that  the 
philosophy  of  Chan's  algorithm  is  to  drop  the  linearly 
dependent  columns  of  the  matrix  of  observations  and  work  with 
the  rest  of  them  to  obtain  the  regression  coefficients. 

2 .   The  Number  of  Computations 

To  compare  the  volume  of  computation  of  the  present 
algorithms  we  consider  that  all  m  column  vectors  of  an  n  x  m 
matrix  A  are  added  to  the  subspace. 

a.  Computing  Regression  Coefficients  at  Each  Step 

In  the  case  where  the  regression  coefficients  are 
computed  at  each  step  the  work  of  algorithms  is  as  follows. 
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The  number  of  flops11  of  Elden's  method  is  about 
2 nm2,  (2(n  -  m)m2  +  4m3/3  for  producing  the  QR  factorization, 
m3/3  for  updating  the  inverse  of  the  triangular  matrix  and  m3/3 
for  computing  the  regression  coefficients) .  Elden's  algorithm, 
as  mentioned  in  Chapter  II  subsection  Bl,  is  updating 
regression  without  using  at  each  step  the  matrix  Q  but  working 
only  on  the  triangular  matrix  R.  This  procedure  avoids 
computations  on  matrix  Q  and  so  is  cheap. 

If  the  data  matrix  is  not  very  rank  degenerate  Chan 
uses  same  work  as  Elden  on  the  updating  regression  procedure, 
(Householder  without  updating  Q)  .  Using  the  eguation  (2.18) 
for  this  case  with  r  =  m  we  have  a  total  work  for  the  rank 
revealing  QR  factorization  of  W(m)  =  nm2  +  llm3/3.  Also  the 
amount  of  flops  for  computing  the  regression  coefficients  is 
m3/3.  In  other  words  the  work  for  Chan's  QR  factorization  of 
RII  is  about  nm2  +  4m3  flops. 

Gragg-Levegue-Trangenstein' s  algorithm  reguires  8nm 
flops  for  column  insertion,  0  flops  for  column  deletion,  3m(m 
+  2n)  for  row  insertion,  m(3m  +  14n)  flops  for  row  deletion. 
In  this  case  the  regression  coefficients  are  computed  at  each 
step  we  have  to  add  the  work  of  solving  the  triangular  system 
Rnjbn,  =  cm.  That  reguires  a  total  of  m3/2  flops.  This  amount  of 
flops  is  because,  as  mentioned  in  Chapter  II  subsection  B2 , 


nFlops  number  means  the  whole  number  of  multiplications 
and  additions. 
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this  algorithm  updates  the  regression  using  the  matrix  Q  for 
keeping  the  numerical  stability.  That  means  that  in  the  worst 
case  the  algorithm  reguires  a  total  work  of  about  6m2  +  2  8nm 
+m3/2  flops. 

After  the  above  we  can  see  that  Elden's  and  Gragg- 
Leveque-Trangenstein ' s  algorithms  outperforms  Chan's 
algorithm.  As  mentioned,  in  this  thesis  we  always  examine  the 
case  in  which  the  observation  matrix  has  more  rows  than 
columns  (i.e.,  n  >  m) .  Keeping  a  ratio  n/m  =  2  Gragg-Leveque- 
Trangenstein • s  algorithm  is  the  cheapest  for  n  bigger  than  18. 
This  means  that  for  big  applications  this  algorithm  is  the 
cheapest. 

b.  Computing  Regression  Coefficients  Once 

In  the  case  where  the  regression  coefficients  are 
computed  once  we  have  a  lower  order  work  for  this  computation 
which  can  be  ignored.  So  the  total  work  for  the  algorithms  is: 
Elden's  2nm2  -  m3/3,  Gragg-Leveque-Trangenstein' s  6m2  +  2  4nm 
and  Chan's  nm2  ±  llm3/3.  Keeping  the  same  ratio  as  above  Gragg- 
Leveque-Trangenstein'  s  algorithm  is  still  the  cheapest  for  n 
bigger  than  15. 


B.   RECOMMENDATIONS 

In  this  thesis  we  examined  three  procedures   of  QR 
factorization  to  update  the  variable  selection  problem.  The 
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above  factorization  can  be  also  used  on  several  other 
mathematical  and  statistical  procedures.  Important  research 
can  be  done  on  using  the  QR  updating  algorithms  to  solve  the 
linear  programming  problem  using  the  simplex  method  and 
Karmarkar's  algorithm. 

As  mentioned  in  Chapter  III  section  D,  we  used  Householder 
codes  to  simulate  the  procedure  of  Elden's  algorithm.  A  closer 
comparison  can  be  done  by  a  full  simulation  of  this  algorithm. 
Finally,  it  should  be  interesting  to  extend  this  thesis  in 
case  where  we  have  an  observation  matrix  with  fewer  rows  than 
columns  (i.e.,  n  <  m)  . 
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APPENDIX  A.  MATLAB  CODES  FOR  HOUSEHOLDER'S  (ELDEN)  ALGORITHM 

The  codes  qrhf.m,  qrhup.m,  qrucd.m,  qruri.m,  qrurd.m 
qrorth.m  and  rot.m  were  reproduced  here  with  Professor's  Gragq 
permission. 

function  W  =  qrhf(A) 

%  W  =  qrhf (A) : 
% 

%  W  is  a  pack  formed  array  which  contains  the  HOUSEHOLDER  QR 
%  FACTORIZATION  of  A.  We  have  A  =  QR  with  Q  unitary  and  R 
%  upper  trapezoidal  with  nonnegative  diagonal  elements.  The 
%  commands  R  =  W,  [Q  R]  =  qrhup(R),  executed  by  qrh,  unpack. 
%  However  this  is  not  necessary,  and  in  fact  it's  inefficient, 
%  for  most  applications. 

%  Copyright  (c)  16  February  1991  by  Bill  Gragg.  All  rights 
%  reserved. 

%  qrhf  calls  sgn. 

%  begin  qrhf 

[n  m]  =  size(A); 
for  k  =  l:min(m,n) 

q  =  k:n;    u  =  A(q,k) ;    r  =  norm(u) ;    t  =  u(l) ; 
if  r  >  0 

t  =  u(l) ;        s  =  -  sgn(t) ;     u(l)  =  t  -  r*s; 
t  =  abs(t);      t  =  1  +  t/r;      d  =  sqrt(2*t); 
A(q,k)  =  u/d;    d  =  r*sqrt(t);    u  =  u/d; 
if  k  <  m 

p  =  k+l:m;    W  =  A(q,p)  ;      W  =  W  -  u*(u'*W)  ; 
t  =  W(l,:);    W(l,:)  =  s'*t;    A(q,p)  =  W; 
end 
end 
end 
W  =  A; 
%  end  qrhf 


function  [Q,R]  =  qrhup(R) 
%  [Q  R]  =  qrhup(R) : 
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%  Computes  the   ELEMENTWISE   QR  factorization  of  A  given  the 

%  output  R.  Thus  [Q  R]=qrh(A) :=qrhup(qrhf (A) )  is  essentially 

%  equivalent  with  matlab's  function  qr,  except  that  we  execute 

%  an  inexpensive  unitary  diagonal  scaling  to  make  the  diagonal 

%  elements  of   R   nonnegative.  It  is  NOT  USEFUL  to  have  Q  in 

%  elementwise  form  to  solve  the   LS   problem,  norm(b  -  Ax)  = 

%  minimum.  The  purpose  of  matlab  having  Q  in  such  form  may  be 

%  to  avoid  having  to  explain  how  the  Householder  least  squares 

%  algorithm  really  works. 
% 

%  Copyright   16   February   1991  by   Bill   Gragg.   All  rights 

%  reserved. 

%  qrhup  calls  sgn. 

%  begin  qrhup 

[n  m]  =  size(R);   s  =  -  sgn (diag(R) ) ;   e  =  ones (n-m, 1) ; 
Q  =  diag([s;e]);   q  =  m+l:n;  z  =  zeros (n-m, 1) ; 

root2  =  sqrt(2); 
for  k  =  min(m,n) : -1 : 1 

q  =  [k  q] ;    u  =  R(q,k) ;    r  =  norm(u) ; 
if  r  >  0 

u  =  u/(r/root2)  ;  T  =  Q(q,q)  ;  Q(q,q)  =  T  -  u*(u'*T)  ; 
end 

R(k,k)  =  r; 
if  k  <  n 

R(k+l:n,k)  =  z ; 
end 

z  =  [  z  ;  0  ]  ; 
end 
%  end  qrhup 


function  W  =  sgn(Z) 

%  w  =  sgn(z)  or  W  =  sgn(Z): 

% 

%  For  z  a  complex  number  w  is   z/abs(z)  if  z   =  0  and  +  1  if 

%  z  =  0.  Thus  sgn  is  the  same  as  matlab's  sign  function  EXCEPT 

%  when  z  =  0.  We  always  have  abs(w)  =  1.  W  is  the  elementwise 

%  (Schur)  sgn  function  of  the  complex  matrix  Z. 

%  Copyright  (c)  19  January  1991  by  Bill  Gragg.  All  rights 
%  reserved. 

%  sgn  calls  no  extrinsic  functions. 

%  begin  sgn 
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W  =  sign(Z) ;    p  =  find(Z  ==  0);    n  =  length(p); 
W(p)  =  ones (n, 1) ; 
end  sgn 

Total  flops  (scalar  case,  see  csgn) : 

Real  case:   0  flops.   Complex   case:   1  sqrt  +  4  mults  +  1 

add. 


function  [x,r]  =  qrhlsl(W,b) 

%  [x  r]  =  qrhlsl(W,b) : 

% 

%  Given  that  the  HOUSEHOLDER  QR  FACTORIZATION  of  A,  is  stored 

%  in  packed  form  in  the  array  W,  qhrs  SOLVES  the  least  squares 

%  problem  norm(b  -  Ax)  =  minimum  for  x.  It  also  efficiently 

%  computes  the  residual  r  :=  (b  -  Ax) .  We  assume  that  rank (A) 

%  =  m  <=  n,  where  A  has  m  columns  and  n  rows. 

% 

%  qhrs  calls  no  extrinsic  functions. 

% 

%  begin  qrhs 

[n  m]  =  size(W);    root2  =  sqrt(2); 
%  Forward  solving:   computing  Q'b  =  H (m) H (m-1) . . .H (1) b. 

for  k  =  l:m 

q  =  k:n;    u  =  W(q,k) ;    rl  =  norm(u) ;    d(k)  =  rl; 
if  rl  >  0 

u  =  u/ (rl/root2) ;  v  =  b(q);  b(q)  =  v  -  u*(u'*v); 
end 

end 
%  Backsolving:   solving  Rx  =  bl  :=  b(l:m)  for  x. 

s  =  -  sgn (diag (W) ) ;    x  =  zeros (m,l); 

x(m)  =  s(m) '*b(m)/d(m) ; 

for  k  =  m-1: -1 : 1 

p  =  [k+i  p];   x(k)  =  (s(k)  «*b(k)  -  w(k,p)  *x(p)  )/d(k)  ; 

end 
%  Computing  the  residual  r. 

if  hi  <  n   r  =  b-  W*x;  else  r  =  0; 

end 
%  end  qrhlsl 


function  [Eb,RSS, F, Fe]  =  regres2 (Q,R, P,RSSp,b,r) 

%  [Eb  RSS  F  Fe]=  regres2 (Q,R, PtRSSp,h, r) : 
% 
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%  Eb  =  E(b),  is  the  expectation  of   beta  of  the  linear  least 

%  squares  problem,  norm(b  -  Ax)  =  minimum,  RSS  is  the  residual 

%  sum  of  squares  and   F   the  statistic   for  a  general  linear 

%  hypothesis.  Also  Fe  is  the  significance  level  to  enter,  SLE, 

%  "F-to-enter",  for  the  new  variable,  on  where  A  =  QRP'  is  the 

%  QR   general   FACTORIZATION  of  A.  p  =  A*x  is  the  orthogonal 

%  projection  of  b  onto  R(A)  and  the  residual  vector  r  =  b  -  pr 

%  =  b  -  Ax,  is  the  orthogonal  projection  of  b  onto  N(A'),  the 

%  orthogonal  complement  of  R(A) .   It  is  assumed  that  A  is  not 

%  a  zero  matrix. 

% 

%  regres2  calls  rows,  cols,  ones  and  norm. 

% 

%  begin  regres2 

Eb  =  r  +  Q*R*P'*x; 

A  =  Q*R*P' ;         n  =  rows (A);  m  =  cols (A); 

e  =  ones(n,l);      RSS  =  r'*r;  v  =  Q'*b; 

f  =  (norm(v) -abs (e ' *b)/sqrt (n) ) * 
(norm (v) +abs (e ' *b) /sqrt (n) ) ; 
df  =  (n-m)/(m-l);   F  =  (df*f)/RSS; 
Fe  =  (abs(RSSp-RSS) *(n-m) )/RSS; 

%  end  regres2 
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APPENDIX  B.  MATLAB  CODES  FOR  GRAGG-LEVEQUE-TRANGENSTEIN ■ S 
ALGORITHM 


function  [Q,R]  =  qruci (Q,R, a, i) 

%  [Q  R]  =  qruci(Q,R,a,i) : 

% 

%  UPDATES  the  QR  factorization  A  =  QR  when  a  is  INSERTED  as 

%  COLUMN  i  of  A.  Q   has  orthonormal  columns  and   R   is  upper 

%  trapezoidal  with  at  least  as  many  columns  as  rows. 

%  Copyright   (c)   20   July  1991  by   Bill   Gragg.   All  rights 
%  reserved, 

%  qruci  calls  qrorth  and  rot. 

%  begin  qruci 

[r  m]  =  size (R) ;  n  =  length (a) ;    [q  s  t]  = 

qrorth (Q, a) ; 

R(:,i+l:m+l)  =  R(:,i:m);    R(:,i)  =  s;       m  =  m  +  1; 
if  r  <  n 

Q  =  [Q  q] ;    R  =  [R;zeros(l,m) ] ; 

r=r+l;    R  ( r ,  i )  =  t ; 
end 
for  k  =  r-l:-l:i 

p  =  k+l:m;  q  =  k:k+l;  [c  s  t]  = 

rot(R(q,i)) ; 

R(q,i)  =  [t;0];        G  =  [c  -  s';s  c ' ] ;    Q(:,q)  = 

Q(:,q)*G; 

R(q,P)  =  G'*R(q,p) ; 
end 
if  i  <  r 

q  =  i+l:r;  d  =  diag(R) ;         d  = 

sgn(d(q) ) ; 

D  =  diag(d);  Q(:,q)  =  Q(:,q)*D;    R(q,:)  = 

D'*R(q, :) ; 
end 
%  end  qruci 


function  [Q,R,a]  =  qrucd(Q,R/i) 
%  [Q  R  a]  =  qrucd(Q,R, i) : 
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%  UPDATES  the  QR  factorization  A  =  QR  when  COLUMN  i  is  DELETED 
%  from  A.  Q  has  orthonormal  columns  and  R  is  upper  trapezoidal 
%  with  at  least  as  many  columns  as  rows.  The  deleted  column  is 
%  called  a  [Ref  19] . 

%  Copyright  (c)  20  July  1991  by  Bill  Gragg.  All  rights 
%  reserved, 

%  qrucd  calls  rot. 

%  begin  qrucd 

if  nargout  >  2      a  =  Q*R(:,i);   end, 

[r  m]  =  size(R);    R(:,i)  =  [];    m  =  m  -  1; 

for  k  =  i:r-l 

p  =  k+l:m;      q  =  k:k+l;      [c  s  t]  =  rot(R(q,k)); 

R(q,k)  =  [t;0];  G  =  [c  -s • ;s  c ' ] ; 

Q( :,q)  =  Q(: ,q)*G; 

if  k  <  m  R(q,P)  =  G'*R(q,p);    end 

end 
if  r  >  m 

r  —  r  -  1 ;         q=l:r; 

Q  =  Q(: ,q) ;         R  =  R(q, :) ; 
else 

s  =  sgn(R(r,r));  Q(:,r)  =  Q(:,r)*s; 

R(r, :)  =  s'*R(r, :) ; 
end 
%  end  qrucd 


function  [Q,R]  =  qruri (Q,R, a, j ) 

%  [Q  R]  =  qruri(Q,R,a, j) : 

% 

%  UPDATES  the  factorization  A  =  QR  when  a'  is  INSERTED  as  ROW 

%  j  of  A.  Q  has  orthonormal  columns  and  R  is  upper  trapezoidal 

%  with  at  least  as  many  columns  as  rows. 

%  Copyright   (c)   20   July  1991  by   Bill   Gragg.   All  rights 

%  reserved, 

%  qruri  calls  rot. 

%  begin  qruri 

m  =  length(a);   [n  r]  =  size(Q); 

n=n+l;       Q(j:n,:)=  [ zeros (l,r);    Q(j:n-1,:)]; 

r  =  r  +  1;      Q(:,r)  =  zeros (n,l);      Q(j,r)  =  1; 

R  =  [R;a' ] ; 

for  k  =  l:r-l 

p  =  k+ 1 :  m ;  q  =  [  k  r  ]  ; 

[est]  =  rot(R(q,k)); 

R(q,k)  =  [t;0];  G  =  [c  -s';s  c • ] ; 

Q(:,q)  =  Q(:,q)*G; 
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if  k  <  m  R(q,p)  =  G'*R(q,p);    end 

end 
if  r  >  m 

r  —  r  -  1 ;  q=l:r; 

Q  =  Q(: ,q) ;  R  -   R(q, :) ; 

else 

s  =  sgn(R(r,r));         Q(:,r)  =  Q(:,r)*s; 

R(r, :)  =  s'*R(r, :) ; 
end 
%  end  qruri 


function  [Q,R,a]  =  qrurd (Q,R, j ) 

%  [Q  R  a]  =  qrurd(Q,R, j) : 

% 

%  UPDATES  the  QR  factorization   A  =  QR   when  ROW  j  is  DELETED 

%  from  A.  Q  has  orthonormal  columns  and  R  is  upper  trapezoidal 

%  with  at  least  as  many  columns  as  rows,  a'  =  A(j,:)  =  Q(j,:)R 

%  is  the  deleted  row. 


%  qrurd  calls  qrorth  and  rot. 

%  Copyright   (c)   20   July  1991  by   Bill   Gragg.   All  rights 

%  reserved, 

%  begin  qrurd 

[n  r]  =  size(Q)  ;    [r  m]  =  size(R);   t  =  [l:j-l  j+l:n]  ; 
if  r  <  n 

q  =  zeros(n,l);         q(j)  =  1;  r  =  r  +  1 ; 

Q(:,r)  =  qrorth(Q,q);    R(r,:)  =  zeros(l,m); 
end 
for  k  =  r-l:-l:l 

p  =  k:m  ;  q  =  [k  r] ; 

[c  s  u]  =  rot(Q(j,q) ) ; 

Q(j,r)  =  u;  G  =  [-s  c' ;c  s' ] ; 

Q(t,q)  =  Q(t,q)*G; 

R(q,p)  =  G'*R(q,p) ; 
end 

r  =  r  -  1;       q  =  l:r;       Q  =  Q(t,q); 
a  =  R(r+1, :) • ;   R  =  R(q, :) ; 

d  =  diag(R) ;     d  =  sgn(d) ;    D  =  diag(d) ;    Q  =  Q*D; 
R  =  D'*R; 
%  end  qrurd 


function  [q,r,s,k]  =  qrorth(Q,a) 
%  [q  r  s  k]  =  qrorth  (Q,  a)  or  q  =  qrorth  (Q) 
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%  ORTHONORMALIZES   column  a  against  the  columns  of   Q   using 

%  REORTHOGONALIZATION.  It  is   assumed  that   Q   has   (nearly) 

%  orthonormal  columns.  Let  [n  m]  =  size(Q).  An  error  message 

%  occurs  if  m  >  n.   For  m  <=  m  we  have 

% 

%  [Q  a]  =  [Q  q][l  r] 

%  [   s]   with   Q'q  =0,   s  >=  0 

% 

%  and,  if  m  <  n, 

Q, 

%  q'q  =  1. 

% 

%  If  m  =  n  we  take   q  :=  0  and   s  :=  0.  If  a  is  not  input  we 

%  take  a  :=  0.  For  m  <  n  and  a  =  0  we  get  a  unit  vector  q  in 

%  the  orthogonal  complement  of  the  range  of  Q. 


%  qrorth  calls  no  extrinsic  functions. 

%  Copyright   (c)   20   July  1991  by   Bill   Gragg.   All  rights 

%  reserved, 

%  begin  qrorth 

[n  m]  =  size(Q); 

if  nargin  <  2   a  =  zeros(n,l);   end 
if  m  >  n   error ('Q  has  too  many  columns.1),   end 
if  m  ==  n   q  =  zeros(n,l);   r  =  Q'*a;   s  =  0;   k  =  0; 
return,   end 

norma  =  norm(a) ;    t  =  norma/2; 

r  =  Q'*a;    b  =  a  -  Q*r;    s  =  norm(b) ;    k  =  1; 

if  norma  ==  0   zflag  =1;   k  =  0;   end 

while  0  ==  0 

if  s  >  t   q=b/s;   break,   end 

if  k  >  4   error (• Process  did  not  terminate  in  5 
iterations.'),   end 

if  s  <=  t*eps/10 

[u  j]  =  min( norms (Q' ) ) ; 
if  s  ==  0 

s  =  eps/4;    b(j)  =  s; 
else 

b(j)  =  b(j)  +  s*eps/4; 
end 

norma  =  s;    k  =  k  +  1/2; 
end 

t  =  s/2;   b  =  b  -  Q*(Q'*b)  ;   s  =  norm(b)  ;   k  =  k  +  1; 
end 
end 

if  zflag   r  =  zeros(m,l);    s  =  0;   end 
%  end  qrorth 
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function    [c,s,r]    =   rot(x,y) 
%    [c    s    r]    =    rot(x,y)    or    [c   s    r]    =    rot(z): 

Q. 

%  Carefully  computes  the  Gram-Schmidt  QR  factorization 

% 

%  z  :=  [x]  =  [c]  r 

%  [y]    [s] 

% 

%  Note  that 

% 

%  [x]  =  [c  -s« ]  [r]  =:  QR 

%  [y]    [s   c»]  [0] 

% 

%  is  a  full  QR  factorization.  This  is  a  "tool  of  the  trade11  in 

%  computational   linear   algebra.  Note  that   Q   and   Q1   are 

%  ROTATIONS  and  that 

% 

%  [  c'  s»]  [x]  =  [r] 

%  [-s   c  ]  [y]    [0]  . 

%  Copyright  (c)  28  October  1990  by  Bill  Gragg.   All  rights 
%  reserved. 

%  rot  calls  no  extrinsic  functions. 

%  begin  rot 

if  nargin  <  2   y=x(2);   x=x(l);   end 
c  =  sign(x) ;    s  =  sign(y) ;    x  =  abs(x) ;    y  =  abs(y) ; 
if  y  >  0 
if  x  >  y 

t  =  y/x;   u  =  sqrt(l  +  t*t)  ;   r  =  x*u;    v  =  t/u; 
u  =  1/u;    c  =  u*c;  s  =  v*s; 

else 

t  =  x/y;   u  =  sqrt(l  +  t*t)  ;   r  =  y*u;    v  =  1/u; 
u  =  t/u;    c  =  u*c;  s  =  v*s; 

end 
else 

r  =  x;    if  r  ==  0   c  =  1;   end 
end 
%  end  rot 

%  Total  flops: 

%     Real  case:      1  sqrt  +   5  mults  +  1  add. 

%     Complex  case:   1  sqrt  +  11  mults  +  3  adds. 
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function  [x,r]  =  qrmgsls (Q,R,b) 

%  [x  r]  =  qrmgsls(Q,R,b) : 

%  Solves  the   norm(b  -  Ax)  =  minimum   for   x   given  the   QR 

%  factorization  A  =  QR   and  computes  the  minimal  norm  rnorm 

%  using  the  MODIFIED  GRAM-SCHMIDT  process  (mgs) . 

% 

%  qrmgss  calls  gfsb. 

%  begin  qrmgsls 

%  Compute  the  "Fourier  coefficients"  c  =  Q'b  and  the  residual 
%  vector  r=b-Ax=b-  QRx  =  b  -  Qc  =  (I  -  QQ')b,  WITHOUT 
%  COMPUTING  x. 

r  =  b; 

for  i  =  l:cols(Q) 

q  =  Q(:,i);    t  =  q'*r;    c  =  [c;t] ; 

end 
%     Compute  r  and  backsolve  for  x. 

x  =  gfsb(R,c) ;    r  =  b  -  Q*c; 
%  end  qrmgsls 
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APPENDIX  C.  MATLAB  CODES  FOR  CHAN'S  ALGORITHM 

function  [Q,R, Pi, rank, W, delta]  =  rrqr(A,tol) 
%  R  =  rrqr(A,tol) 

%  [Q,R, Pi, rank, W, delta]  =  rrqr(A,tol) 
% 

%  Compute  a  rank  revealing  QR  factorization  of  A: 
%     A*Pi  =  Q*R  =  Q  [  R_ll  R_12  ]  , 
%  [   0    R_2  2  ] 

%  such  that  Q  is  m-by-n,  R  is  triangular  n-by-n,  and 
%   1)   R_ll  is  rank-by-rank  and  well-conditioned, 
%   2)   rank  =  numerical  rank  of  A  with  respect  to  tol, 
%       defined  as  the  number  of  singular  values  greater  than 
%       tol, 

%   3)   norm(R_2  2)  is  of  the  same  order  of  magnitude  as  the 
%       (rank+1) *th  singular  value. 
% 

%  Also,  return  a  matrix  W  whose  columns  are  orthonormal  and 
%  span  an  aproximation  to  the  null  "N"  space  of  A,  and  the  6 
%  delta (rank: nl: 2)  containing  lower  and  upper  bounds  for  the 
%  last  n-rank+1  singular  values  of  A  (the  first  rank-1  rows  of 
%  delta  are  zero) . 
% 

%  If  no  tol  is  specified,  sqrt (n) *norm(A, 1) *eps  is  default. 
% 

%  This  program  is  an  implementation  of  the  algorithm  described 
%  in  the  paper:  T.  Chan,  "Rank  Revealing  QR  factorizations", 
%  Lin.  Alg.  Appl .  88/89  (1987),  67-82. 

%  The  use  of  the  factor  c_max_ratio  was  suggested  in:  C.  H. 
%  Bischof,  &  P.  C.  Hansen,  "Structure  preserving  and  rank 
%  revealing   QR-factorizations" ,   SISSC,  to   appear 

[m,n]  =  size(A);  delta  =  zeros(n,2);  c_max_ratio  =  10; 

if  (nargin==l) ,  tol  =  sqrt (n) *norm(A, 1) *eps;  end 

if  (nargout>4) ,  W  =  [];  end 

%  Compute  an  initial  QR  factorization  A*Pi  =  Q*R. 
[Q,R,Pi]  =  qr(A);  Q  =  Q(:,l:n);  R  =  R(l:n,:); 

%  Prepare  for  the  iterations.  Estimate  smallest  singular 
%  value  of  R. 

nu  =  0 ;  i  =  n ; 

[sest,v]  =  ccvl(R);  if  (nargout>5) ,  delta(n,l)  =  sest;  end 

%  Loop  until  a  singular  value  estimate  larger  than  tol  is 
%  found . 

while  sest  <  tol,  nu  =  nu+1; 

%  Update  the  matrix  W. 
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if  (nargout>4),  W  =  [ [v; zeros (nu-1, 1) ] ,W] ;  end 

%  Find  the  element  in  v  with  greatest  index,  numerically 

%  within  a  factor   c_max_ratio   of  the   numerically   largest 

%  element. 

vmax  =  norm(v,inf); 

for  k=i:-l:l 

if  (vmax  <=  abs (v(k) ) *c_max_ratio) ,  break,  end 

end 

%  If  necessary,  generate  the  permutation  that  brings  the  pivot 
%  element  of  v  to  the  last  position,  apply  the  permutation  to 
%  W,  Pi,  and  R,  compute  a  new  QR  factorization  of  R(l:i,l:i), 
%  and  update  Q  and  R. 
if  (k<i) 

p  =  [k+l:i,k];  Pi(:,k:i)  =  Pi(:,p);  R(:,k:i)  =  R(:,p); 
if  (nargout>4),  W(k:i,:)  =  W(p,:);  end 
for  j=k:i-l 

[c,s]  =  gen_g(R(j , j :n) ,R(j+l, j:n) ) ; 
[R(j,j:n) ,R(j+l,j:n) ]  = 
app_g_left(c,s,R(j, j:n) ,R( j+1, j :n) ) ; 
R(j+l,j)  =  0; 

[Q(:,J),Q(:,J+1)]  =  app_g_right (c, s,Q( : , j ) ,Q( : , j+1) ) ; 
end 
end 

%  Provide  an  upper  bound  for  the  i'th  singular  value, 
if  (nargout>5) ,  delta(i,2)  =  norm(R(i:n, i:n) ) ;  end 

%  Estimate  the  smallest  singular  value  of  R(l: i-1 , 1: i-1) , 
%  which  is  a  lower  bound  for  the  (i-1) 'th  singular  value. 

i  =  i-1;  [sest,v]  =  ccvl (R(l: i, 1: i) ) ; 

if  (nargout>5) ,  delta(i,l)  =  sest;  end 

end 

%  Finish  the  computation.   If  nargout  <  2,  return  R. 

if  (nargout>5) ,  delta(i,2)  =  norm (R (rank: n, rank:n) ) ;  end 
rank  =  n  -  nu;  if  (nargout>4),  W  =  Pi*W;  end 
if  (nargout  <  2) ,  Q  =  R;  end 

%  This  algorithm  is  described  by  T.  Chan  &  P.  Hansen  [Ref  15]  . 


function  [x,r]  =  basic(Q,R, Pi ,b, rank) 
% 

%  [x  r]  =  basic(Q,R,Pi,b,rank) 
% 

%  Compute  the  basic  solution.   If  the  RRQR  of  A  is 
%     A*Pi  =  Q*R  =  Q  [  R_ll  R_12  ]  , 
%  [   0    R_22  ] 
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%  where  All  is  rank-by-rank,  then  the  basic  solution  is 
%    x  =  Pi*[  inv(Rll)  0  ]*Q'*b  . 
%  [00] 

%  Ref . :  G.  H.  Golub  &  C.  F.  Van  Loan,  "Matrix  Computations", 
%  Johns  Hopkins,  1989.   Subsection  5.5.6. 

%  Per  Christian  Hansen,  UNI-C,  07/11/90. 

x  =  Pi(: , 1: rank) *(R(1: rank, 1: rank) \(Q(: ,l:rank) ' *b) ) ; 
r  =  b  -  Q*R*Pi*x; 
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